Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.

[1]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[2]  P. Ekman An argument for basic emotions , 1992 .

[3]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[4]  Say Wei Foo,et al.  Stress Classification Using Subband Based Features , 2003 .

[5]  Ryohei Nakatsu,et al.  Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 2000, Knowl. Based Syst..

[6]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Shrikanth S. Narayanan,et al.  Combining acoustic and language information for emotion recognition , 2002, INTERSPEECH.

[9]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[10]  Rosalind W. Picard,et al.  Modeling drivers' speech under stress , 2003, Speech Commun..

[11]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[14]  Georges Quénot,et al.  Recognizing emotions for the audio-visual document indexing , 2004, Proceedings. ISCC 2004. Ninth International Symposium on Computers And Communications (IEEE Cat. No.04TH8769).

[15]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[16]  Shrikanth S. Narayanan,et al.  Classifying emotions in human-machine spoken dialogs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[17]  Tsang-Long Pao,et al.  A Comparative Study of Different Weighting Schemes on KNN-Based Emotion Recognition in Mandarin Speech , 2007, ICIC.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[20]  Eric Castelli,et al.  Speaker-Dependent Emotion Recognition For Audio Document Indexing , 2004 .

[21]  P. Zimmermann,et al.  Affective Computing—A Rationale for Measuring Mood With Mouse and Keyboard , 2003, International journal of occupational safety and ergonomics : JOSE.

[22]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Ruili Wang,et al.  Spoken affect classification using neural networks , 2005, 2005 IEEE International Conference on Granular Computing.

[24]  Tsang-Long Pao,et al.  Combining Acoustic Features for Improved Emotion Recognition in Mandarin Speech , 2005, ACII.

[25]  Jun-Heng Yeh,et al.  Using Recognition of Emotions in Speech to Better Understand Brand Slogans , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.