Combination of generative models and SVM based classifier for speech emotion recognition

Modeling time series data of varying length is important in different domains. There are two paradigms for modeling the varying length sequential data. Tasks such as speech recognition need modeling the temporal dynamics and the correlations among the features. Hidden Markov models (HMM) are used for these tasks. In tasks such as speaker recognition, audio classification and speech emotion recognition, modeling the temporal dynamics is not critical. Gaussian mixture models (GMM) are commonly used for these tasks. Generative models such as HMMs and GMMs focus on estimating the density of the data and are not suitable for classifying the data of confusable classes. Discriminative classifiers such as support vector machines (SVM) are suitable for the fixed dimensional patterns. In this paper, we propose a hybrid framework where a generative front end is used for representing the varying length time series data and then a discriminative model is used for classification. A score based approach and a segment modeling based approach are proposed in this framework. Both the approaches are applied for speech emotion recognition. The performance is compared with that of an SVM classifier that uses different statistical features and also with that of the GMM classifiers that use maximum likelihood method and the variational Bayes method for parameter estimation. Both the proposed approaches outperform the methods used for comparison.

[1]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[2]  Yasunari Obuchi,et al.  Emotion Recognition using Mel-Frequency Cepstral Coefficients , 2007 .

[3]  Ching Y. Suen,et al.  On the structure of hidden Markov models , 2004, Pattern Recognit. Lett..

[4]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[5]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[6]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[7]  R. Anitha,et al.  Outerproduct of trajectory matrix for acoustic modeling using support vector machines , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[8]  Werner Verhelst,et al.  An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech , 2007, Speech Commun..

[9]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[10]  Raj Reddy,et al.  Automatic Speech Recognition: The Development of the Sphinx Recognition System , 1988 .

[11]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[12]  Lars Kai Hansen,et al.  Temporal Feature Integration for Music Genre Classification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[14]  Robert P. W. Duin,et al.  Dissimilarity representations allow for building good classifiers , 2002, Pattern Recognit. Lett..

[15]  Rosalind W. Picard,et al.  Classical and novel discriminant features for affect recognition from speech , 2005, INTERSPEECH.

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Sriganesh Madhvanath,et al.  Online handwriting recognition for Tamil , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Wei Wu,et al.  GMM Supervector Based SVM with Spectral Features for Speech Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[20]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[21]  Krzysztof Slot,et al.  Low-dimensional feature space derivation for emotion recognition , 2005, INTERSPEECH.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Dong Yu,et al.  Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  Yuqing Gao,et al.  Maximum entropy direct models for speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Elisabeth André,et al.  Emotion recognition based on physiological changes in music listening , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Liang Wang,et al.  Structure-Based Statistical Features and Multivariate Time Series Clustering , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[27]  Shaomin Mu,et al.  Sequence-similarity kernels for SVMs to detect anomalies in system calls , 2007, Neurocomputing.

[28]  Chellu Chandra Sekhar,et al.  Model Based Clustering of Audio Clips Using Gaussian Mixture Models , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[29]  Chellu Chandra Sekhar,et al.  Variational Gaussian Mixture Models for Speech Emotion Recognition , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[30]  Robert Sabourin,et al.  An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[32]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[33]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.