Passive versus active: Vocal classification system

Five expressions are commonly considered to characterize human emotional states: Happiness, Surprise, Anger, Sadness and Neutral. Different measures can be extracted from speech signals to characterize these expressions, for example the pitch, the energy, the SPI and the speech rate. Automatic classification of the five expressions based on these features shows a great confusion between Anger, Surprise and Happiness on the one hand and Neutral and Sadness on the other hand. Such a confusion is also observed when humans make the same classification. We propose to define two classes of expression: Active gathering Happiness, Surprise and Anger versus Passive gathering Neutral and Sadness. Such a partition is also better suited for the integration of speech information in a multimodal classification system based on speech and video, which is the long term aim of our work. In this paper, we test several classification methods, namely a Bayesian classifier, a Linear Discriminant Analysis (LDA), the K Nearest Neighbours (KNN) and a Support Vector Machine with gaussian radial basis function kernel (SVM). For the considered two classes, the best performances are achieved with the SVM classifier with a recognition rate of 89.74% for Active state and of 86.54 % for Passive state.

[1]  Zakia Hammal,et al.  Facial Expression Recognition Based on the Belief Theory: Comparison with Different Classifiers , 2005, ICIAP.

[2]  Marie Cottrell,et al.  Bootstrap for neural model selection , 2000, ESANN.

[3]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[4]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[5]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  Klaus R. Scherer,et al.  Vocal communication of emotion , 2000 .

[8]  P. Ekman,et al.  Relative importance of face, body, and speech in judgments of personality and affect. , 1980 .

[9]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[10]  David G. Stork,et al.  Pattern Classification , 1973 .

[11]  Dimitar D. Deliyski,et al.  Acoustic model and evaluation of pathological voice production , 1993, EUROSPEECH.

[12]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[13]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[14]  R. Cowie,et al.  Speech and Emotion: Proceedings of the International Speech Communication Association Research Workshop , 2000 .

[15]  P. Laukka,et al.  Communication of emotions in vocal expression and music performance: different channels, same code? , 2003, Psychological bulletin.