A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition

Expressing emotion to others and recognizing emotion state of the counterpart are not difficult for human. Emotion state of a person may be recognized from the facial expression, voice, and/or gesture. Speech emotion recognition research gained a lot of attention in recent years. One of the important subjects in speech emotion recognition research is the feature selection. The speech features used will greatly influence the recognition rate. In this research, we try to find the most discriminative features for emotion recognition out from a set of 78 features. We use these features to study the feature characteristics for individual speaker by using a GMM classifier. We obtained an average of 71% recognition rate in speaker dependent case while an average of 48% recognition rate in speaker independent case.

[1]  Anna Esposito,et al.  Analysis of high-level features for vocal emotion recognition , 2011, 2011 34th International Conference on Telecommunications and Signal Processing (TSP).

[2]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  W. Minker,et al.  Combined Speech-Emotion Recognition for Spoken Human-Computer Interfaces , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[4]  Constantine Kotropoulos,et al.  Speaker-independent negative emotion recognition , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[5]  Nupur Prakash,et al.  Evaluation of MFCC for emotion identification in Hindi speech , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[6]  E.H. Kim,et al.  Speech ermotion recognition separately from voiced and unvoiced sound for emotional interaction robot , 2008, 2008 International Conference on Control, Automation and Systems.

[7]  Thomas S. Huang,et al.  Emotion recognition from speech VIA boosted Gaussian mixture models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[8]  Tsang-Long Pao,et al.  Segment-based emotion recognition from continuous Mandarin Chinese speech , 2011, Comput. Hum. Behav..

[9]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[10]  Hema A. Murthy,et al.  Feature diversity for emotion, language and speaker verification , 2011, 2011 National Conference on Communications (NCC).

[11]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[12]  Yuan-Hao Chang,et al.  Emotion Recognition and Evaluation of Mandarin Speech Using Weighted D-KNN Classification , 2005, ROCLING/IJCLCLP.

[13]  Inma Hernáez,et al.  Feature Analysis and Evaluation for Automatic Emotion Identification in Speech , 2010, IEEE Transactions on Multimedia.

[14]  Jeong-Sik Park,et al.  Feature vector classification based speech emotion recognition for service robots , 2009, IEEE Transactions on Consumer Electronics.