Emotion Analysis Using Audio/Video, EMG and EEG: A Dataset and Comparison Study

This paper describes a study on automated emotion recognition using four different modalities – audio, video, electromyography (EMG), and electroencephalography (EEG). We collected a dataset using the 4 modalities as 12 human subjects expressed six different emotions or maintained a neutral expression. Three different aspects of emotion recognition were investigated: model selection, feature selection, and data selection. Both generative models (DBNs) and discriminative models (LSTMs) were applied to the four modalities, and from these analyses we conclude that LSTM is better for audio and video together with their corresponding sophisticated feature extractors (MFCC and CNN), whereas DBN is better for both EMG and EEG. By examining these signals at different stages (pre-speech, during-speech, and post-speech) of the current and following trials, we found that the most effective stages for emotion recognition from EEG occur after the emotion has been expressed, suggesting that the neural signals conveying an emotion are long-lasting.

[1]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[4]  Jing Hu,et al.  Removal of EOG and EMG artifacts from EEG using combination of functional link neural network and adaptive neural fuzzy inference system , 2015, Neurocomputing.

[5]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[6]  Shanxiao Yang,et al.  Emotion Recognition of EMG Based on Improved L-M BP Neural Network and SVM , 2011, J. Softw..

[7]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[8]  Andrzej Cichocki,et al.  EEG windowed statistical wavelet scoring for evaluation and discrimination of muscular artifacts , 2008, Physiological measurement.

[9]  Geoffrey Zweig,et al.  From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Wei Liu,et al.  Multimodal Emotion Recognition Using Multimodal Deep Learning , 2016, ArXiv.

[12]  Guideline 3: Minimum Technical Standards for EEG Recording in Suspected Cerebral Death , 2006, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[13]  Francesca Odone,et al.  Real-time Automatic Emotion Recognition from Body Gestures , 2014, ArXiv.

[14]  W. De Clercq,et al.  Automatic Removal of Ocular Artifacts in the EEG without an EOG Reference Channel , 2006, Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006.

[15]  Qiang Ji,et al.  Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ilia Uma physiological signals based human emotion recognition a review , 2014 .

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  F. B. Reguig,et al.  Emotion recognition from physiological signals , 2011, Journal of medical engineering & technology.

[19]  Daniel Bernhardt,et al.  Emotion inference from human body motion , 2010 .

[20]  Banu Diri,et al.  Emotion recognition from the human voice , 2013, 2013 21st Signal Processing and Communications Applications Conference (SIU).

[21]  Zhigang Zhu,et al.  Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Mohamad Ivan Fanany,et al.  Combining Generative and Discriminative Neural Networks for Sleep Stages Classification , 2016, ArXiv.

[23]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[24]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[25]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[26]  Li Deng,et al.  Deep Discriminative and Generative Models for Pattern Recognition , 2015 .

[27]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[28]  A. Phinyomark,et al.  Application of Wavelet Analysis in EMG Feature Extraction for Pattern Classification , 2011 .

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[31]  Arun Kumar Sangaiah,et al.  An Approach to EEG Based Emotion Recognition and Classification Using Kernel Density Estimation , 2015 .

[32]  Mubarak Shah,et al.  Generative Adversarial Networks Conditioned by Brain Signals , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  S. Palazzo,et al.  Deep Learning Human Mind for Automated Visual Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Loïc Kessous,et al.  Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis , 2010, Journal on Multimodal User Interfaces.

[35]  Sazali Yaacob,et al.  Emotion recognition from facial EMG signals using higher order statistics and principal component analysis , 2014 .