Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM

In this paper we have analysed emotion recognition performance in speaker dependent, text dependent, text independent, speaker independent, language dependent and cross language emotion recognition from speech. These studies were carried out using Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) as classification models. IITKGP-SESC and IITKGP-SEHSC emotional speech corpora are used for carried out these studies. The emotions considered in this study are anger, disgust, fear, happy, neutral, sarcastic, and surprise. Mel Frequency Cepstral Coefficients (MFCCs) features are used for identifying the emotions. Emotion recognition performance of speaker dependent mode is better than speaker independent and cross language modes. From the results it is observed that emotion recognition performance depends on the speaker and language.

[1]  Shashidhar G. Koolagudi,et al.  IITKGP-SESC: Speech Database for Emotion Analysis , 2009, IC3.

[2]  Hansjörg Mixdorff,et al.  Cross-language Perception of Hebrew and German Authentic Emotional Speech , 2011, ICPhS.

[3]  Yong Zhao,et al.  Stranded Gaussian mixture hidden Markov models for robust speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[5]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech using global and local prosodic features , 2013, Int. J. Speech Technol..

[6]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech using source, system, and prosodic features , 2012, Int. J. Speech Technol..

[7]  H. Barrett,et al.  Vocal Emotion Recognition Across Disparate Cultures , 2008 .

[8]  K. S. Rao,et al.  IITKGP-SEHSC : Hindi Speech Corpus for Emotion Analysis , 2011, 2011 International Conference on Devices and Communications (ICDeCom).

[9]  Steve Young,et al.  The HTK book , 1995 .

[10]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[11]  Fabio Paternò,et al.  Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema , 2012, International Journal of Speech Technology.

[12]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.