论文信息 - Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals

Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals

Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when studying human behaviour it is common to annotate actions performed by the participant over several other modalities such as video recordings of the face or physiological signals. These events are nominal, not frequent and are not sampled at a continuous rate while signals are numeric and often sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among these modalities which is often studied after each modality has been summarised or reduced. This paper investigates a novel approach to model the relation between such modality types bypassing the need for summarising each modality independently of each other. For that purpose, we introduce a deep learning model based on convolutional neural networks that is adapted to process multiple modalities at different time resolutions we name deep multimodal fusion. Furthermore, we introduce and compare three alternative methods (convolution, training and pooling fusion) to integrate sequences of events with continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events. Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher prediction accuracies compared to single-modality models. In addition, it appears that pooling fusion, based on a novel filter-pooling method provides the more effective fusion approach for the investigated types of data.

Georgios N. Yannakakis | Héctor P Martínez | Georgios N Yannakakis | H. P. Martínez

[1] W. Krzanowski. The Performance of Fisher's Linear Discriminant Function Under Non-Optimal Conditions , 1977 .

[2] Geoffrey E. Hinton,et al. Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[3] Yves Chauvin,et al. Backpropagation: theory, architectures, and applications , 1995 .

[4] Anil K. Jain,et al. Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[6] L. Rothkrantz,et al. Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[7] Eyke Hüllermeier,et al. Preference Learning , 2005, Künstliche Intell..

[8] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9] B. Mesquita,et al. The experience of emotion. , 2007, Annual review of psychology.

[10] James C. Lester,et al. Modeling self-efficacy in intelligent tutoring systems: An inductive approach , 2008, User Modeling and User-Adapted Interaction.

[11] Tapio Pahikkala,et al. An efficient algorithm for learning to rank from preference graphs , 2009, Machine Learning.

[12] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[13] Eyke Hllermeier,et al. Preference Learning , 2010 .

[14] Georgios N. Yannakakis,et al. Genetic search feature selection for affective modeling: a case study on reported preferences , 2010, AFFINE '10.

[15] Georgios N. Yannakakis,et al. Towards affective camera control in games , 2010, User Modeling and User-Adapted Interaction.

[16] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[17] Georgios N. Yannakakis,et al. Mining multimodal sequential patterns: a case study on affect detection , 2011, ICMI '11.

[18] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[19] Pascal Vincent,et al. Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[20] Yoshua Bengio,et al. Learning deep physiological models of affect , 2013, IEEE Computational Intelligence Magazine.

[21] Tobi Delbruck,et al. Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[22] Georgios N. Yannakakis,et al. The Preference Learning Toolbox , 2015, ArXiv.