Improving Automatic Emotion Recognition from speech using Rhythm and Temporal feature

This paper is devoted to improve automatic emotion recognition from speech by incorporating rhythm and temporal features. Research on automatic emotion recognition so far has mostly been based on applying features like MFCCs, pitch and energy or intensity. The idea focuses on borrowing rhythm features from linguistic and phonetic analysis and applying them to the speech signal on the basis of acoustic knowledge only. In addition to this we exploit a set of temporal and loudness features. A segmentation unit is employed in starting to separate the voiced/unvoiced and silence parts and features are explored on different segments. Thereafter different classifiers are used for classification. After selecting the top features using an IGR filter we are able to achieve a recognition rate of 80.60 % on the Berlin Emotion Database for the speaker dependent framework.

[1]  Hynek Hermansky,et al.  Perceptually based linear predictive analysis of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[3]  P. Laukka,et al.  Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening , 2004 .

[4]  Tim Polzehl,et al.  Anger recognition in speech using acoustic and linguistic cues , 2011, Speech Commun..

[5]  John Kingston,et al.  Papers in Laboratory Phonology: Index of names , 1990 .

[6]  V. Dellwo Rhythm and Speech Rate: A Variation Coefficient for deltaC , 2006 .

[7]  E. Grabe,et al.  Durational variability in speech and the rhythm class hypothesis , 2005 .

[8]  Petra Wagner,et al.  Relations between language rhythm and speech rate , 2003 .

[9]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[10]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[11]  Lei Chen,et al.  Applying Rhythm Features to Automatically Assess Non-Native Speech , 2011, INTERSPEECH.

[12]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[13]  Bin Yang,et al.  Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[15]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[16]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[17]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[18]  Adrian Leemann,et al.  Speaker idiosyncratic rhythmic features in the speech signal , 2012, INTERSPEECH.

[19]  Tapio Seppänen,et al.  Automatic recognition of emotions in spoken Finnish : preliminary results and applications , 2003 .

[20]  David Gerhard,et al.  Pitch Extraction and Fundamental Frequency: History and Current Techniques , 2003 .