Gender Based Emotion Recognition System for Telugu Rural Dialects Using Hidden Markov Models

Automatic emotion recognition in speech is a research area with a wide range of applications in human interactions. The basic mathematical tool used for emotion recognition is Pattern recognition which involves three operations, namely, pre-processing, feature extraction and classification. This paper introduces a procedure for emotion recognition using Hidden Markov Models (HMM), which is used to divide five emotional states: anger, surprise, happiness, sadness and neutral state. The approach is based on standard speech recognition technology using hidden continuous markov model by selection of low level features and the design of the recognition system. Emotional Speech Database from Telugu Rural Dialects of Andhra Pradesh (TRDAP) was designed using several speaker's voices comprising the emotional states. The accuracy of recognizing five different emotions for both genders of classification is 80% for anger-emotion which is achieved by using the best combination of 39-dimensioanl feature vector for every frame (13 MFCCs, 13 Delta Coefficients and 13 Acceleration Coefficients) and a classifier using HMM. This outcome very much matches with that acquired with the same database with subjective evaluation by human judges. Both gender-dependent and gender-independent experiments are conducted on TRDAP emotional speech database.

[1]  Christian Wellekens,et al.  On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Thomas Fang Zheng,et al.  Comparison of different implementations of MFCC , 2001, Journal of Computer Science and Technology.

[3]  Daniel J. Mashao,et al.  Combining classifier decisions for robust speaker identification , 2006, Pattern Recognit..

[4]  Qi Tian,et al.  HMM-Based Audio Keyword Generation , 2004, PCM.

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[7]  Mervyn A. Jack,et al.  Discriminating semi-continuous HMM for speaker verification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Mark E. Forsyth Discriminating observation probability (DOP) HMM for speaker verification , 1995, Speech Commun..

[9]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.