Teager Mel and PLP Fusion Feature Based Speech Emotion Recognition

Although a number of features derived from linear speech production theory have been investigated as speech emotion indicators, the recognition accuracy still stays unsatisfactory for realistic applications. In this paper, Teager Mel, a novel speech emotion feature is proposed based on Teager Energy Operator (TEO) and the Mel perception characteristics. Due to such advantages as nonlinear and simple, TEO appears to be appropriate for speech emotion description. From the auditory psychophysical point of view, Perceptual Linear Predictive (PLP) features are also investigated as an extension to Teager Mel. A Support Vector Machine (SVM) classifier is then adopted to the fusion of Teager Mel and PLP features on a Chinese discrete emotional speech corpus (Dis-EC) that includes four emotions: happiness, anger, sorrow and surprise. Comparing with the previous studies based on prosodic features, the application of Teager Mel features can achieve a recognition accuracy improvement of 10.4%, and similarly 8.2% for PLP features. The recognition accuracy reaches79.7% while using the fusion features, which appears to be the most attractive in relative researches.

[1]  T. J. Thomas A finite element model of fluid flow in the vocal tract , 1986 .

[2]  Theodoros Iliou,et al.  Statistical Evaluation of Speech Features for Emotion Recognition , 2009, 2009 Fourth International Conference on Digital Telecommunications.

[3]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[4]  Say Wei Foo,et al.  Classification of stress in speech using linear and nonlinear features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[6]  Jon Sánchez,et al.  Automatic emotion recognition using prosodic parameters , 2005, INTERSPEECH.

[7]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[8]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9]  Xi Li,et al.  Stress and Emotion Classification using Jitter and Shimmer Features , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[12]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[14]  W. Marsden I and J , 2012 .

[15]  Han Wen Review on Speech Emotion Recognition , 2014 .

[16]  Climent Nadeu,et al.  Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition , 1997, IEEE Trans. Speech Audio Process..

[17]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[18]  Mahesh Chandra,et al.  Use of Different Features for Emotion Recognition Using MLP Network , 2015 .

[19]  Fabio Paternò,et al.  Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema , 2012, International Journal of Speech Technology.

[20]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[21]  R. S. McGowan,et al.  An aeroacoustic approach to phonation. , 1988, The Journal of the Acoustical Society of America.

[22]  Yixiong Pan,et al.  SPEECH EMOTION RECOGNITION USING SUPPORT VECTOR MACHINE , 2010 .