Gender-dependent emotion recognition based on HMMs and SPHMMs

It is well known that emotion recognition performance is not ideal. The work of this research is devoted to improving emotion recognition performance by employing a two-stage recognizer that combines and integrates gender recognizer and emotion recognizer into one system. Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in the two-stage recognizer. This recognizer has been tested on two distinct and separate emotional speech databases. The first database is our collected database and the second one is the Emotional Prosody Speech and Transcripts database. Six basic emotions including the neutral state have been used in each database. Our results show that emotion recognition performance based on the two-stage approach (gender-dependent emotion recognizer) has been significantly improved compared to that based on emotion recognizer without gender information and emotion recognizer with correct gender information by an average of 11% and 5%, respectively. This work shows that the highest emotion identification performance takes place when the classifiers are completely biased towards suprasegmental models and no impact of acoustic models. The results achieved based on the two-stage framework fall within 2.28% of those obtained in subjective assessment by human judges.

[1]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[2]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[3]  Ismail Shahin Speaker Identification in Each of the Neutral and Shouted Talking Environments Based on Gender-Dependent Approach Using Sphmms , 2017, ArXiv.

[4]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[5]  I. Shahin Speaker Identification in Emotional Environments , 2010 .

[6]  Ismail Shahin,et al.  Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models , 2008, Signal Process..

[7]  Yasunari Obuchi,et al.  Emotion Recognition using Mel-Frequency Cepstral Coefficients , 2007 .

[8]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[9]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[10]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[11]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[12]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Thomas Fang Zheng,et al.  Study on speaker verification on emotional speech , 2006, INTERSPEECH.

[14]  Tiago H. Falk,et al.  Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[16]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[17]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..

[18]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[19]  Analysis and investigation of emotion identification in biased emotional talking environments , 2011 .

[20]  Kwee-Bo Sim,et al.  Emotion recognition and acoustic analysis from speech signal , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[21]  Alessandra Russo,et al.  Multistyle classification of speech under stress using feature subset selection based on genetic algorithms , 2007, Speech Commun..

[22]  Ismail Shahin Identifying speakers using their emotion cues , 2011, Int. J. Speech Technol..

[23]  Alex Waibel,et al.  Detecting Emotions in Speech , 1998 .

[24]  Petros Maragos,et al.  Analysis and classification of speech signals by generalized fractal dimension features , 2009, Speech Commun..

[25]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[26]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.