Speech Emotion Recognition Using Residual Phase and MFCC Features

The main objective of this research is to develop a speech emotion recognition system using residual phase and MFCC features with autoassociative neural network (AANN). The speech emotion recognition system classifies the speech emotion into predefined categories such as anger, fear, happy, neutral or sad. The proposed technique for speech emotion recognition (SER) has two phases : Feature extraction, and Classification. Initially, speech signal is given to feature extraction phase to extract residual phase and MFCC features. Based on the feature vectors extracted from the training data, Autoassociative neural network (AANN) are trained to classify the emotions into anger, fear, happy, neutral or sad. Using residual phase and MFCC features the performance of the proposed technique is evaluated in terms of FAR and FRR. The experimental results show that the residual phase gives an equal error rate (EER) of 41.0%, and the system using the MFCC features gives an EER of 20.0%. By combining the both the residual phase and the MFCC features at the matching score level, an EER of 16.0% is obtained. Keyword—Mel frequency cepstral coefficients, Residual phase, Autoassociative neural network, Speech emotion recognition.

[1]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[2]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[4]  P. Dhanalakshmi,et al.  Classification of audio signals using SVM and RBFNN , 2009, Expert Syst. Appl..

[5]  Stephen E. Levinson,et al.  Children's emotion recognition in an intelligent tutoring scenario , 2004, INTERSPEECH.

[6]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[7]  Bayya Yegnanarayana,et al.  Multimodal person authentication using speech, face and visual speech , 2008, Comput. Vis. Image Underst..

[8]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Paolo Frasconi,et al.  Learning in multilayered networks used as autoassociators , 1995, IEEE Trans. Neural Networks.

[10]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[11]  B. Yegnanarayana,et al.  Online text-independent speaker verification system using autoassociative neural network models , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[12]  Shashidhar G. Koolagudi,et al.  Spectral Features for Emotion Classification , 2009, 2009 IEEE International Advance Computing Conference.

[13]  Shashidhar G. Koolagudi,et al.  Emotion recognition using LP residual , 2010, 2010 IEEE Students Technology Symposium (TechSym).

[14]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[15]  Yoon Keun Kwak,et al.  Improved Emotion Recognition With a Novel Speaker-Independent Feature , 2009, IEEE/ASME Transactions on Mechatronics.

[16]  Shashidhar G. Koolagudi,et al.  Emotion Recognition using Speech Features , 2012, Springer Briefs in Electrical and Computer Engineering.

[17]  K. Takahashi,et al.  Emotion Recognition in Speech Using Neural Networks , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[18]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals , 2008, IEEE Transactions on Multimedia.

[19]  Kishore Prahallad,et al.  AANN: an alternative to GMM for pattern recognition , 2002, Neural Networks.

[20]  Shashidhar G. Koolagudi,et al.  Emotion Recognition from Semi Natural Speech Using Artificial Neural Networks and Excitation Source Features , 2012, IC3.

[21]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.