Speaker identification in emotional talking environments based on CSPHMM2s

Speaker recognition systems perform almost ideal in neutral talking environments; however, these systems perform poorly in emotional talking environments. This research is devoted to enhancing the low performance of text-independent and emotion-dependent speaker identification in emotional talking environments based on employing Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) as classifiers. This work has been tested on our speech database which is composed of 50 speakers talking in six different emotional states. These states are neutral, angry, sad, happy, disgust, and fear. Our results show that the average speaker identification performance in these talking environments based on CSPHMM2s is 81.50% with an improvement rate of 5.61%, 3.39%, and 3.06% compared, respectively, to First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s), and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s). Our results based on subjective evaluation by human judges fall within 2.26% of those obtained based on CSPHMM2s.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Jaakko Astola,et al.  A study of the effect of emotional state upon text-independent speaker identification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Zhaohui Wu,et al.  Rules Based Feature Modification for Affective Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  J. Oglesby,et al.  Optimisation of neural models for speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[6]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[7]  Wojciech Majewski,et al.  SVM based text-dependent speaker identification for large set of voices , 2004, 2004 12th European Signal Processing Conference.

[8]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[9]  Ismail Shahin,et al.  Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models , 2006, Speech Commun..

[10]  J. Oglesby,et al.  Radial basis function networks for speaker recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[12]  Rosalind W. Picard Affective Computing , 1997 .

[13]  Say Wei Foo,et al.  Speech emotion recognition using hidden Markov models , 2003, Speech Commun..

[14]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[15]  Ismail Shahin,et al.  Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models , 2008, Signal Process..

[16]  Raghunath S. Holambe,et al.  Text-Independent Speaker Identification in Emotional Environments: A Classifier Fusion Approach , 2011, ICFCE.

[17]  Younès Bennani,et al.  On the use of TDNN-extracted features information in talker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[19]  I. Shahin Speaker Identification in Emotional Environments , 2010 .

[20]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[21]  Thomas Fang Zheng,et al.  Emotion attribute projection for speaker recognition on emotional speech , 2007, INTERSPEECH.

[22]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[23]  Raghunath S. Holambe,et al.  Text-Independent Speaker Identification Using Hidden Markov Models , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[24]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Shashidhar G. Koolagudi,et al.  Speaker recognition in the case of emotional environment using transformation of speech features , 2012, CUBE.

[26]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[27]  Zhaohui Wu,et al.  Emotion-State Conversion for Speaker Recognition , 2005, ACII.

[28]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[29]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[30]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[31]  M. Savic,et al.  Variable parameter speaker verification system based on hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[32]  Robert V. Hogg,et al.  Introduction to Mathematical Statistics. , 1966 .

[33]  Ismail Shahin Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments , 2010, EURASIP J. Audio Speech Music. Process..

[34]  P. Gallinari,et al.  A connectionist approach for automatic speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[35]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[36]  J. Astola,et al.  SPEAKER RECOGNITION IN AN EMOTIONAL ENVIRONMENT , 2011 .

[37]  Sadaoki Furui Speaker-dependent-feature extraction, recognition and processing techniques , 1991, Speech Commun..

[38]  Ismail Shahin Identifying speakers using their emotion cues , 2011, Int. J. Speech Technol..