Multimodal Emotion Recognition by extracting common and modality-specific information

Emotion recognition technologies have been widely used in numerous areas including advertising, healthcare and online education. Previous works usually recognize the emotion from either the acoustic or the visual signal, yielding unsatisfied performances and limited applications. To improve the inference capability, we present a multimodal emotion recognition model, EMOdal. Apart from learning the audio and visual data respectively, EMOdal efficiently learns the common and modality-specific information underlying the two kinds of signals, and therefore improves the inference ability. The model has been evaluated on our large-scale emotional data set. The comprehensive evaluations demonstrate that our model outperforms traditional approaches.

[1]  Emad Barsoum,et al.  Emotion recognition in the wild from videos using images , 2016, ICMI.

[2]  Han Zou,et al.  Predicting Blood Glucose Dynamics with Multi-time-series Deep Learning , 2017, SenSys.

[3]  Junzhou Huang,et al.  An Efficient Approach to Informative Feature Extraction from Multimodal Data , 2018, AAAI.

[4]  Grigoriy Sterling,et al.  Emotion Recognition From Speech With Recurrent Neural Networks , 2017, ArXiv.

[5]  Han Zou,et al.  BikeMate: Bike Riding Behavior Monitoring with Smartphones , 2017, MobiQuitous.

[6]  Weixi Gu PhD Forum Abstract: Non-intrusive Blood Glucose Monitor by Multi-task Deep Learning , 2017, 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[7]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[8]  Han Zou,et al.  SugarMate: Non-intrusive Blood Glucose Monitoring with Smartphones , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[9]  Liu Zhi,et al.  Emotion and Associated Topic Detection for Course Comments in a MOOC Platform , 2016 .

[10]  Zhi Liu,et al.  Emotion and Associated Topic Detection for Course Comments in a MOOC Platform , 2016, 2016 International Conference on Educational Innovation through Technology (EITT).

[11]  Xiangxiang Xu,et al.  An Information Theoretic Interpretation to Deep Neural Networks , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[12]  Theodoros Giannakopoulos pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis , 2015, PloS one.

[13]  Olga Sourina,et al.  EEG-Based Emotion-Adaptive Advertising , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.