Research on the Phonetic Emotion Recognition Model of Mandarin Chinese

In recent years, Emotion Recognition (AVER) has become more and more important in the field of human-computer interaction. Due to certain defects in single-modal information, we complemented audio and visual information to perform multi-modal emotion recognition. At the same time, the choice of different classifiers has different accuracy in the emotion classification experiment. Therefore, in this paper, we introduce a multi-modal emotion recognition system. After obtaining multi-modal features, use different classifiers for learning and training, and obtain Multi Layer Perceptron Classifier, Logistic Regression, Support Vector Classifier and Linear Discriminant Analysis four classifiers with high accuracy for multi-modal emotion recognition. This paper explains the work of each part of the multimodal emotion recognition system, focusing on the performance comparison of classifiers in emotion recognition.

[1]  Wei Li,et al.  A snapshot research and implementation of multimodal information fusion for data-driven emotion recognition , 2020, Inf. Fusion.

[2]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[3]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[4]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[5]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[6]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Roland Göcke,et al.  Extending Long Short-Term Memory for Multi-View Structured Learning , 2016, ECCV.

[8]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[9]  Nicu Sebe,et al.  Computer Vision – ECCV 2016 , 2016, Lecture Notes in Computer Science.

[10]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[11]  Maja Pantic,et al.  The SEMAINE corpus of emotionally coloured character interactions , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[12]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A bottom-up oblique decision tree induction algorithm , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[13]  Ling Shao,et al.  Multimodal Dynamic Networks for Gesture Recognition , 2014, ACM Multimedia.

[14]  L. Rothkrantz Multimodal recognition of emotions in car environments , 2009 .

[15]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[16]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[17]  Jian Huang,et al.  Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition , 2018, ArXiv.

[18]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[19]  Sethu Vijayakumar,et al.  Proceedings of the twenty-first international conference on Machine learning , 2004 .

[20]  Arnab Bhattacharya,et al.  Emotion Recognition from Audio and Visual Data using F-score based Fusion , 2014, CODS.

[21]  R. Samworth Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.

[22]  Hanqing Lu,et al.  Deep learning driven hypergraph representation for image-based emotion recognition , 2016, ICMI.

[23]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[24]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[25]  Sergio Escalera,et al.  Fusion of classifier predictions for audio-visual emotion recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[26]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.