Recognizing emotions from speech

Automatic Emotion Recognition (AER) from speech is one of the most important sub domains in affective computing. Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. This paper explores the Linear Prediction Coefficients (LPC) of speech signal for characterizing the basic emotions from speech. The emotions used in this study are sad, anger, happy, disgust, fear, and boredom. For capturing the emotion specific information from these higher order relations, neural network (NN) is used. The decrease in the error during training phase of the NN's and the emotion recognition performance of the models, demonstrate that the excitation source component of speech contains emotion-specific information and is indeed being captured by the NN.

[1]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[2]  A.A. Razak,et al.  Towards automatic recognition of emotion in speech , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[3]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[4]  Jacek M. Zurada,et al.  Introduction to artificial neural systems , 1992 .

[5]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[7]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.