Multi-modal Deep Learning

Liu, Si

Etd

Multi-modal Deep Learning

Public

In many problems in machine learning, the same underlying phenomena can be observed using different types of data. We call such problems ``multi-modal'' since they allow multiple different views of the same problem. However, many current deep learning techniques are not designed for such multi-modal data, and here we study how to utilize more than one dataset to improve the performance of deep learning models. In particular, we demonstrate how deep neural networks performance can be improved using our proposed multi-modal learning compared with single-modal learning. Additionally, we explore effective and efficient ways to combine different data modalities. This research demonstrates our techniques on many multi-modal problems, including autoencoders trained with the MNIST dataset and deep neural network classifiers trained with real-world chemistry data. We propose the no-harm factor, a method to ensure that adding another modality doesn't harm the performance of models in the presence of a small amount of multi-modal data. The no-harm factor is easy to apply and is practical, especially in chemical analysis tasks where limited labeled data are available.

Creator