COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM : APPLICATION TO SPEECH EMOTION RECOGNITION

Speech emotion recognition aims at automatically identifying the emotional or physical state of a human being from his or her voice. The emotional state is an important factor in human communication, because it provides feedback information in many applications. This paper makes a comparison of two standard methods used for speaker recognition and verification: Gaussian Mixture Models (GMM) and Support Vector Machines (SVM) for emotion recognition. An extensive comparison of two methods: GMM and GMM SVM sequence kernel is conducted. The main goal here is to analyze and compare influence of initial setting of parameters such as number of mixture components, used number of iterations and volume of training data for these two methods. Experimental studies are performed over the Berlin Emotional Database, expressing different emotions, in German language. The emotions used in this study are anger, fear, joy, boredom, neutral, disgust, and sadness. Experimental results show the effectiveness of the combination of GMM and SVM in order to classify sound data sequences when compared to systems based on GMM.

[1]  Dorra Ben Ayed Mezghanni,et al.  Incorporating Belief Function in SVM for Phoneme Recognition , 2014, HAIS.

[2]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[3]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Vidhyasaharan Sethu,et al.  On the use of speech parameter contours for emotion recognition , 2013, EURASIP J. Audio Speech Music. Process..

[6]  Shrikanth Narayanan,et al.  Recognition of negative emotions from the speech signal , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[8]  Dorra Ben Ayed Mezghanni,et al.  Improved Frame Level Features and SVM Supervectors Approach for the Recogniton of Emotional States from Speech: Application to categorical and dimensional states , 2014, ArXiv.

[9]  Rohit Sinha,et al.  Speech based Emotion Recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers , 2013, 2013 National Conference on Communications (NCC).

[10]  Yi-Ping Phoebe Chen,et al.  Acoustic feature selection for automatic emotion recognition from speech , 2009, Inf. Process. Manag..

[11]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[12]  Lijiang Chen,et al.  Multi-level Speech Emotion Recognition Based on HMM and ANN , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[13]  Dorra Ben Ayed Mezghanni,et al.  On the use of different feature extraction methods for linear and non linear kernels , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[16]  Bin Yang,et al.  Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Wee Ser,et al.  A Hybrid PNN-GMM classification scheme for speech emotion recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[18]  Dorra Ben Ayed Mezghanni,et al.  A Multi Level Data Fusion Approach for Speaker Identification on Telephone Speech , 2014, ArXiv.