Toward robust learning of the Gaussian mixture state emission densities for hidden Markov models

One important class of state emission densities of the hiddenMarkov model (HMM) is the Gaussian mixture densities. The classical Baum-Welch algorithm often fails to reliably learn the Gaussian mixture densities when there is insufficient training data, due to the large number of free parameters present in the model. In this paper, we propose a novel strategy for robustly and accurately learning the Gaussian mixture state emission densities of the HMM. The strategy is based on an ensemble framework for probability density estimation in which the learning of the Gaussian mixture densities is formulated as a gradient descent search in a function space. The resulting learning algorithm is called “the boosting Baum-Welch algorithm.” Our preliminary experiment results on emotion recognition from speech show that the proposed algorithm outperforms the original Baum-Welch algorithm on this task.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[3]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[4]  Thomas S. Huang,et al.  Emotion recognition from speech VIA boosted Gaussian mixture models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[5]  Richard A. Berk,et al.  An Introduction to Ensemble Methods for Data Analysis , 2004 .

[6]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[12]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[13]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[14]  Björn W. Schuller,et al.  Patterns, prototypes, performance: classifying emotional user states , 2008, INTERSPEECH.

[15]  Jianying Hu,et al.  HMM Based On-Line Handwriting Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[17]  Saharon Rosset,et al.  Boosting Density Estimation , 2002, NIPS.

[18]  Fei Wang,et al.  Boosting GMM and Its Two Applications , 2005, Multiple Classifier Systems.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[21]  P. Deb Finite Mixture Models , 2008 .

[22]  Thomas S. Huang,et al.  Boosting Gaussian mixture models via discriminant analysis , 2008, 2008 19th International Conference on Pattern Recognition.

[23]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[24]  Daniel Povey,et al.  Universal background model based speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.