Vocal emotion recognition in five languages of Assam using features based on MFCCs and Eigen Values of Autocorrelation Matrix in presence of babble noise

This work investigates whether vocal emotion expressions of (i) discrete emotion be distinguished from ‘no-emotion’ (i.e. neutral), (ii) one discrete emotion be distinguished from another, (iii) surprise, which is actually a cognitive component that could be present with any emotion, be also recognized as distinct emotion, (iv) discrete emotion be recognized cross-lingually. This study will enable us to get more information regarding nature and function of emotion. Furthermore, this work will help in developing a generalized vocal emotion recognition system, which will increase the efficiency of human-machine interaction systems. In this work, an emotional speech database consisting of short sentences of six full-blown basic emotions and neutral is created with 140 simulated utterances per speaker of five native languages of Assam. This database is validated by a Listening Test. A new feature set is proposed based on Eigen Values of Autocorrelation Matrix (EVAM) of each frame of the speech signal. The Gaussian Mixture Model (GMM) is used as classifier. The performance of the proposed feature set is compared with Mel Frequency Cepstral Coefficients (MFCCs) at sampling frequency of 8.1 kHz and with additive babble noise of 5 db and 0 db Signal-to-Noise Ratios (SNRs) under matched noise training and testing condition.

[1]  NetComm Limited SNR(Signal-to-noise ratio) , 2010 .

[2]  A. Routray,et al.  Emotion recognition from Assamese speeches using MFCC features and GMM classifier , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[3]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[4]  Philip Rose Forensic Speaker Identification , 2002 .

[5]  P. Laukka,et al.  Communication of emotions in vocal expression and music performance: different channels, same code? , 2003, Psychological bulletin.

[6]  P. Johnson-Laird,et al.  Towards a Cognitive Theory of Emotions , 1987 .

[7]  John L. Arnott,et al.  Implementation and testing of a system for producing emotion-by-rule in synthetic speech , 1995, Speech Commun..

[8]  Wendy J. Holmes,et al.  Speech Synthesis and Recognition , 1988 .

[9]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[10]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[11]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[12]  B. Porat,et al.  Digital Spectral Analysis with Applications. , 1988 .

[13]  S. Ramamohan,et al.  Sinusoidal model-based analysis and classification of stressed speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[15]  K. Scherer,et al.  Vocal expression of emotion. , 2003 .

[16]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Sadaoki Furui,et al.  Digital Speech Processing, Synthesis, and Recognition , 1989 .

[18]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[19]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[20]  S. J. Campanella DIGITAL SPEECH PROCESSING METHODS , 1972 .

[21]  K. Scherer,et al.  Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures , 2001 .

[22]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .

[23]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[24]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[25]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[26]  Petri Laukka Vocal Expression of Emotion Discrete-emotions and Dimensional Accounts , 2004 .

[27]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[28]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[29]  Ling Guan,et al.  An investigation of speech-based human emotion recognition , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[30]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[31]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.