Hierarchical Classification of Emotional Speech

Speech emotion as anger, boredom, fear, gladness, etc. is high semantic information and its automatic analysis may have many applications such as smart human-computer interactions or multimedia indexing. Main difficulties for an efficient speech emotion classification reside in complex emotional class borders leading to necessity of appropriate audio feature selection. While current work in the literature only relies on classical frequency and energy based features and make use of a global classifier with a identical feature set for different emotion classes, we propose in this paper some new harmonic and Zipf based features for better emotion class characterization and a hierarchical classification scheme as we discovered that different emotional classes need different feature set for a better discrimination. Experimented on Berlin dataset [11] with 68 features, our emotion classifier reaches a classification rate of 76.22% and up to 79.47% when a first gender classification is applied, whereas current works in the literature usually display, as far as we know, a classification rate from 55% to 70%.

[1]  Zhongzhe Xiao,et al.  Two-stage Classification of Emotional Speech , 2006, International Conference on Digital Telecommunications (ICDT'06).

[2]  Malcolm Slaney,et al.  Baby Ears: a recognition system for affective vocalizations , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[4]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[5]  Alex Waibel,et al.  EMOTION-SENSITIVE HUMAN-COMPUTER INTERFACES , 2000 .

[6]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[7]  Nicole Vincent,et al.  ZIPF ANALYSIS OF AUDIO SIGNALS , 2004 .

[8]  Cecile Pereira DIMENSIONS OF EMOTIONAL MEANING IN SPEECH , 2000 .

[9]  Paul Sajda,et al.  Role of feature selection in building pattern recognizers for computer-aided diagnosis , 1998, Medical Imaging.

[10]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[11]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[13]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..