Recognition of Emotions from Speech using Excitation Source Features

Abstract In this paper, the speech database is introduced for characterizing the emotions present in speech. A semi natural database GEU-SNESC (GEU Semi Natural Emotion Speech Corpus) is used for obtaining emotion specific information using LP residual samples as features. The corpus is collected by recording dialogues of popular film actors/actresses from Hindi movies. The emotions which are considered in this study are sad, anger, happy and neutral. In this paper Linear Prediction (LP) residual of speech signal is used for characterizing the basic emotions present in the speech. LP residual is obtained by LP analysis, by inverse filtering of the speech signal. For capturing the emotion specific information from the higher order relations, present in the LP residual, Gaussian mixture models (GMM) are used. The emotion recognition performance is observed to be about 50-60%.

[1]  B. Yegnanarayana,et al.  Combining evidence from subsegmental and segmental features for audio clip classification , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[2]  H. Wakita Residual energy of linear prediction applied to vowel and speaker recognition , 1976 .

[3]  S. R. Mahadeva Prasanna,et al.  Speech enhancement using excitation source information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[5]  Shashidhar G. Koolagudi,et al.  Emotion recognition using LP residual , 2010, 2010 IEEE Students Technology Symposium (TechSym).

[6]  Marc Cavazza,et al.  EmoEmma: emotional speech input for interactive storytelling , 2009, AAMAS.

[7]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[8]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[9]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[10]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Ryohei Nakatsu,et al.  Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 2000, Knowl. Based Syst..

[12]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[13]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.