Acoustic Emotion Recognition Using Linear and Nonlinear Cepstral Coefficients

Recognizing human emotions through vocal channel has gained increased attention recently. In this paper, we study how used features, and classifiers impact recognition accuracy of emotions present in speech. Four emotional states are considered for classification of emotions from speech in this work. For this aim, features are extracted from audio characteristics of emotional speech using Linear Frequency Cepstral Coefficients (LFCC) and Mel-Frequency Cepstral Coefficients (MFCC). Further, these features are classified using Hidden Markov Model (HMM) and Support Vector Machine (SVM).

[1]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[2]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Zeynep Inanoglu,et al.  Emotive alert: HMM-based emotion detection in voicemail messages , 2005, IUI '05.

[4]  Malcolm Slaney,et al.  BabyEars: A recognition system for affective vocalizations , 2003, Speech Commun..

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  R. L. Bouquin Enhancement of noisy speech signals: application to mobile radio communications , 1996 .

[7]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[8]  Daniel Garcia-Romero,et al.  Linear versus mel frequency cepstral coefficients for speaker recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  Jody Kreiman,et al.  Perception of aperiodicity in pathological voice. , 2005, The Journal of the Acoustical Society of America.

[10]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[11]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[12]  D. F. Specht,et al.  Probabilistic neural networks for classification, mapping, or associative memory , 1988, IEEE 1988 International Conference on Neural Networks.

[13]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[15]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[16]  James D. Edge,et al.  Audio-visual feature selection and reduction for emotion classification , 2008, AVSP.

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  Steve Young,et al.  The HTK book version 3.4 , 2006 .