Bimodal recognition of affective states with the features inspired from human visual and auditory perception system

In this article, attention‐based mechanism with the enhancement on biologically inspired network for emotion recognition is proposed. Existing bio‐inspired models use multiscale and multiorientation architecture to gain discriminative power and to extract meticulous visual features. Prevailing HMAX model represents S2 layers by randomly selected prototype patches from training samples that increase the computational complexity and degrade the discerning ability. As eyes and mouth regions are the most powerful and reliable cues in determining facial emotions, they serve as the prototype patches for S2 layer in HMAX model. Audio code 4 book is constructed from mel‐frequency cepstral coefficients, temporal and spectral features processed by principal component analysis. Audio and video data features are fused to train support vector machine classifier. The attained results on eNTERFACE, surrey audio‐visual expressed emotion and acted facial expressions in the wild database datasets ascertain the efficiency of the proposed architecture for emotion recognition.

[1]  Qin Jin,et al.  Video emotion recognition in the wild based on fusion of multimodal features , 2016, ICMI.

[2]  Senthil Ragavan Valayapalayam Kittusamy,et al.  Facial Expressions Recognition Using Eigenspaces , 2012 .

[3]  Frédéric Jurie,et al.  Temporal multimodal fusion for video emotion classification in the wild , 2017, ICMI.

[4]  Dacheng Tao,et al.  Biologically Inspired Feature Manifold for Scene Classification , 2010, IEEE Transactions on Image Processing.

[5]  Mirela C. Popa,et al.  Multimodal fusion based on information gain for emotion recognition in the wild , 2017, 2017 Intelligent Systems Conference (IntelliSys).

[6]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[7]  Wolfgang Minker,et al.  Feature and decision level audio-visual data fusion in emotion recognition problem , 2015, 2015 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO).

[8]  L. Rothkrantz Multimodal recognition of emotions in car environments , 2009 .

[9]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Yongzhao Zhan,et al.  Speech Emotion Recognition Using CNN , 2014, ACM Multimedia.

[11]  Shrikanth S. Narayanan,et al.  Combining acoustic and language information for emotion recognition , 2002, INTERSPEECH.

[12]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  M. S. Likitha,et al.  Speech based human emotion recognition using MFCC , 2017, 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET).

[14]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[15]  Erik Cambria,et al.  Multi-attention Recurrent Network for Human Communication Comprehension , 2018, AAAI.

[16]  Tomaso Poggio,et al.  CNS: a GPU-based framework for simulating cortically-organized networks , 2010 .

[17]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[18]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[19]  Philip J. B. Jackson,et al.  Speaker-dependent audio-visual emotion recognition , 2009, AVSP.

[20]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[21]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  S. Demleitner [Communication without words]. , 1997, Pflege aktuell.

[23]  Verónica Pérez-Rosas,et al.  Multimodal Sentiment Analysis of Spanish Online Videos , 2013, IEEE Intelligent Systems.

[24]  Benoit Huet,et al.  Toward emotion indexing of multimedia excerpts , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[25]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[26]  Jane Yung-jen Hsu,et al.  Sentic blending: Scalable multimodal fusion for the continuous interpretation of semantics and sentics , 2013, 2013 IEEE Symposium on Computational Intelligence for Human-like Intelligence (CIHLI).

[27]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[28]  Yun Fu,et al.  Human age estimation using bio-inspired features , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Maie Bachmann,et al.  Audiovisual emotion recognition in wild , 2018, Machine Vision and Applications.

[30]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Haizhou Li,et al.  Audio and face video emotion recognition in the wild using deep neural networks and small datasets , 2016, ICMI.

[32]  Wen Gao,et al.  Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching , 2018, IEEE Transactions on Multimedia.

[33]  Steve Young A review of large-vocabulary continuous-speech , 1996 .

[34]  Rajendran Parthiban,et al.  Spatiotemporal feature extraction for facial expression recognition , 2016, IET Image Process..

[35]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[36]  Syed Muhammad Anwar,et al.  Emotion recognition from facial expressions using hybrid feature descriptors , 2018, IET Image Process..

[37]  M. Ahmadi,et al.  Local gradient-based illumination invariant face recognition using local phase quantisation and multi-resolution local binary pattern fusion , 2015, IET Image Process..

[38]  Sungyoung Lee,et al.  Human Facial Expression Recognition Using Wavelet Transform and Hidden Markov Model , 2013, IWAAL.

[39]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[40]  Ali Borji,et al.  Invariance analysis of modified C2 features: case study—handwritten digit recognition , 2009, Machine Vision and Applications.

[41]  Sergio Escalera,et al.  Audio-Visual Emotion Recognition in Video Clips , 2019, IEEE Transactions on Affective Computing.

[42]  Erik Cambria,et al.  Context-Dependent Sentiment Analysis in User-Generated Videos , 2017, ACL.

[43]  Erik Cambria,et al.  Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos , 2018, NAACL.

[44]  Nasrollah Moghaddam Charkari,et al.  Multimodal information fusion application to human emotion recognition from face and speech , 2010, Multimedia Tools and Applications.

[45]  Vitomir Štruc,et al.  Towards Efficient Multi-Modal Emotion Recognition , 2013 .

[46]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[47]  Munaf Rashid,et al.  Human emotion recognition from videos using spatio-temporal and audio features , 2012, The Visual Computer.

[48]  Raveendran Paramesran,et al.  Speech emotion classification using combined neurogram and INTERSPEECH 2010 paralinguistic challenge features , 2017, IET Signal Process..

[49]  Björn W. Schuller,et al.  Audiovisual vocal outburst classification in noisy acoustic conditions , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  L. Lamel,et al.  Emotion detection in task-oriented spoken dialogues , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[51]  Ioannis Pitas,et al.  Comparison of ICA approaches for facial expression recognition , 2009, Signal Image Video Process..

[52]  Jesse Hoey,et al.  EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[53]  Aurobinda Routray,et al.  Automatic facial expression recognition using features of salient facial patches , 2015, IEEE Transactions on Affective Computing.

[54]  Emad Barsoum,et al.  Emotion recognition in the wild from videos using images , 2016, ICMI.

[55]  Björn W. Schuller,et al.  LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework , 2013, Image Vis. Comput..

[56]  Lior Wolf,et al.  Using Biologically Inspired Features for Face Processing , 2007, International Journal of Computer Vision.

[57]  Shiqing Zhang,et al.  Facial Expression Recognition Using Local Fisher Discriminant Analysis , 2011, CSEE.

[58]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[59]  Erik Cambria,et al.  Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.

[60]  Mohamed S. Kamel,et al.  Audio-visual feature-decision level fusion for spontaneous emotion estimation in speech conversations , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[61]  Erik Cambria,et al.  Towards an intelligent framework for multimodal affective data analysis , 2015, Neural Networks.

[62]  Aly A. Farag,et al.  Facial expression recognition based on geometric and optical flow features in colour image sequences , 2012 .

[63]  Bo Zhang,et al.  Enhanced HMAX model with feedforward feature learning for multiclass categorization , 2015, Front. Comput. Neurosci..

[64]  Jenn-Jier James Lien,et al.  Facial expression recognition system based on rigid and non-rigid motion separation and 3D pose estimation , 2009, Pattern Recognit..

[65]  Manasi S. Patwardhan,et al.  Survey on real-time facial expression recognition techniques , 2016, IET Biom..

[66]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[67]  Rosalind W. Picard The Promise of Affective Computing , 2015 .

[68]  Emily Mower Provost,et al.  Wild wild emotion: a multimodal ensemble approach , 2016, ICMI.

[69]  Xiaolin Hu,et al.  Sparsity-Regularized HMAX for Visual Recognition , 2014, PloS one.

[70]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Chung-Hsien Wu,et al.  Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[72]  Xuelong Li,et al.  Enhanced Biologically Inspired Model for Object Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[73]  Asit Barman,et al.  Facial expression recognition using distance and shape signature features , 2017, Pattern Recognit. Lett..

[74]  James D. Edge,et al.  Audio-visual feature selection and reduction for emotion classification , 2008, AVSP.

[75]  Subhasmita Sahoo,et al.  Emotion recognition from audio-visual data using rule based decision level fusion , 2016, 2016 IEEE Students’ Technology Symposium (TechSym).

[76]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[77]  R. Larsen,et al.  Affect intensity as an individual difference characteristic: A review , 1987 .