Speaker identification in emotional talking environments using both gender and emotion cues

Speaker recognition performance is usually very high in neutral talking environments; however, the performance is significantly degraded in emotional talking environments. This work is devoted to proposing, implementing, and evaluating a new approach to enhance the degraded performance of text-independent speaker identification in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues using Hidden Markov Models (HMMs) as classifiers. This approach has been tested on our collected speech database. The results of this work show that speaker identification performance based on using both gender and emotion cues is higher than that based on using gender cues only, emotion cues only, and neither gender nor emotion cues. The results obtained based on the new proposed approach are close to those obtained in subjective evaluation by human judges.

[1]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[2]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[3]  Liming Chen,et al.  Gender identification using a general audio classifier , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[4]  I. Shahin Speaker Identification in Emotional Environments , 2010 .

[5]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[6]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[7]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[8]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[9]  Rosalind W. Picard Affective Computing , 1997 .

[10]  Tiago H. Falk,et al.  Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[12]  Zhaohui Wu,et al.  Emotion-State Conversion for Speaker Recognition , 2005, ACII.

[13]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[14]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[15]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[16]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[17]  Thomas Fang Zheng,et al.  Emotion attribute projection for speaker recognition on emotional speech , 2007, INTERSPEECH.