Automatic Speech Emotion and Speaker Recognition Based on Hybrid GMM and FFBNN

In this paper we present text dependent speaker recognition with an enhancement of detecting the emotion of the speaker prior using the hybrid FFBN and GMM methods. The emotional state of the speaker influences recognition system. Mel-frequency Cepstral Coefficient (MFCC) feature set is used for experimentation. To recognize the emotional state of a speaker Gaussian Mixture Model (GMM) is used in training phase and in testing phase Feed Forward Back Propagation Neural Network (FFBNN). Speech database consisting of 25 speakers recorded in five different emotional states: happy, angry, sad, surprise and neutral is used for experimentation. The results reveal that the emotional state of the speaker shows a significant impact on the accuracy of speaker recognition.

[1]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[2]  O. Gauci,et al.  A reproducing kernel Hilbert space approach for speech enhancement , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[3]  Rosalind W. Picard,et al.  Recognizing affect from speech prosody using hierarchical graphical models , 2011, Speech Commun..

[4]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[5]  Kai Feng,et al.  Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  B Gold,et al.  Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[7]  A. Routray,et al.  Emotion recognition from Assamese speeches using MFCC features and GMM classifier , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[8]  William M. Campbell,et al.  Text-Independent Speaker Recognition , 2008 .

[9]  Danko Komlen,et al.  Text Independent Speaker Recognition Using LBG , 2011 .

[10]  Jaakko Astola,et al.  A study of the effect of emotional state upon text-independent speaker identification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Lukás Burget,et al.  Recent progress in prosodic speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Ila Vennila,et al.  A Hybrid Speech Recognition System with Hidden Markov Model and Radial Basis Function Neural Network , 2013 .