Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

Abstract Usually, people talk neutrally in environments where there are no abnormal talking conditions such as stress and emotion. Other emotional conditions that might affect people’s talking tone include happiness, anger, and sadness. Such emotions are directly affected by the patient’s health status. In neutral talking environments, speakers can be easily verified; however, in emotional talking environments, speakers cannot be easily verified as in neutral talking ones. Consequently, speaker verification systems do not perform well in emotional talking environments as they do in neutral talking environments. In this work, a two-stage approach has been employed and evaluated to improve speaker verification performance in emotional talking environments. This approach employs speaker’s emotion cues (text-independent and emotion-dependent speaker verification problem) based on both hidden Markov models (HMMs) and suprasegmental HMMs as classifiers. The approach is composed of two cascaded stages that combine and integrate an emotion recognizer and a speaker recognizer into one recognizer. The architecture has been tested on two different and separate emotional speech databases: our collected database and the Emotional Prosody Speech and Transcripts database. The results of this work show that the proposed approach gives promising results with a significant improvement over previous studies and other approaches such as emotion-independent speaker verification approach and emotion-dependent speaker verification approach based completely on HMMs.

[1]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[2]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[3]  Aurobinda Routray,et al.  Vocal emotion recognition in five native languages of Assam using new wavelet features , 2009, Int. J. Speech Technol..

[4]  P. Sivakumaran,et al.  Speaker verification under mismatched data conditions , 2009 .

[5]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[6]  Ismail Shahin Identifying speakers using their emotion cues , 2011, Int. J. Speech Technol..

[7]  I. Shahin Speaker Identification in Emotional Environments , 2010 .

[8]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[9]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[10]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[11]  Jean-François Bonastre,et al.  Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia) , 2005, INTERSPEECH.

[12]  S. R. Mahadeva Prasanna,et al.  Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Ismail Shahin Speaking Style Authentication Using Suprasegmental Hidden Markov Models , 2017, ArXiv.

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[15]  D. Reynolds Automatic Speaker Recognition Using Gaussian Mixture Speaker Models , 1995 .

[16]  Alex Waibel,et al.  Detecting Emotions in Speech , 1998 .

[17]  A. Routray,et al.  Emotion recognition from Assamese speeches using MFCC features and GMM classifier , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[18]  Petros Maragos,et al.  Analysis and classification of speech signals by generalized fractal dimension features , 2009, Speech Commun..

[19]  Klaus R. Scherer,et al.  Can automatic speaker verification be improved by training the algorithms on emotional speech? , 2000, INTERSPEECH.

[20]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[21]  Bayya Yegnanarayana,et al.  Extraction and representation of prosodic features for language and speaker recognition , 2008, Speech Commun..

[22]  Thomas Fang Zheng,et al.  Emotion attribute projection for speaker recognition on emotional speech , 2007, INTERSPEECH.

[23]  Thomas Fang Zheng,et al.  Study on speaker verification on emotional speech , 2006, INTERSPEECH.

[24]  Tiago H. Falk,et al.  Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Ismail Shahin,et al.  Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models , 2008, Signal Process..

[26]  Ismail Shahin,et al.  Verifying speakers in emotional environments , 2009, 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).