Gaussian Mixture Models Reduction by Variational Maximum Mutual Information

Gaussian mixture models (GMMs) are widely used in a variety of classification tasks where it is often important to approximate high order models by models with fewer components. The paper proposes a novel approach to this problem based on a parametric realization of the maximum mutual information (MMI) criterion and its approximation by a closed-form expression named variational-MMI (VMMI). The maximization of the VMMI can be carried out in an analytically tractable manner and it aims at improving the discrimination ability of the reduced set of models, a goal that was not targeted in previous approaches that simplify each class-related GMM independently. Two effective algorithms are proposed and studied for the optimization of the VMMI criterion. One is a steepest descent type algorithm, and the other, called line search A-functions (LSAF), uses concave associated functions. Experiments held in two speech related tasks, phone recognition and language recognition, demonstrate that the VMMI-based parametric model reduction algorithms significantly outperform previous non-discriminative methods. According to these experiments, the EM-like LSAF-based algorithm requires less iterations and converges to a better value of the objective function compared to the steepest descent algorithm.

[1]  Salvatore D. Morgera,et al.  An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[3]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[4]  Nuno Vasconcelos,et al.  Learning Mixture Hierarchies , 1998, NIPS.

[5]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[9]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[10]  Nebojsa Jojic,et al.  Recursive estimation of generative models of video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Hayit Greenspan,et al.  Simplifying Mixture Models Using the Unscented Transform , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Tanja Schultz,et al.  Generalized Baum-Welch Algorithm and its Implication to a New Extended Baum-Welch Algorithm , 2011, INTERSPEECH.

[14]  Frank Nielsen,et al.  Simplifying Gaussian mixture models via entropic quantization , 2009, 2009 17th European Signal Processing Conference.

[15]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[16]  Scott Axelrod,et al.  Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  Jacob Goldberger,et al.  Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[19]  A. Nadas,et al.  Decoder selection based on cross-entropies , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[20]  David Burshtein,et al.  A discriminative training algorithm for hidden Markov models , 2004, IEEE Transactions on Speech and Audio Processing.

[21]  Justin Dauwels,et al.  Message-passing decoding of lattices using Gaussian mixtures , 2008, 2008 IEEE International Symposium on Information Theory.

[22]  Haizhou Li,et al.  Optimization Algorithms and Applications for Speech and Language Processing , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Vaibhava Goel,et al.  Refactoring acoustic models using variational expectation-maximization , 2009, INTERSPEECH.

[24]  Yuval Bistritz,et al.  Discriminative simplification of mixture models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[26]  Andreas Stolcke,et al.  Improved maximum mutual information estimation training of continuous density HMMs , 2001, INTERSPEECH.

[27]  James T. Kwok,et al.  Simplifying Mixture Models Through Function Approximation , 2006, IEEE Transactions on Neural Networks.

[28]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[29]  Pierrick Bruneau,et al.  Parameter-based reduction of Gaussian mixture models with a variational-Bayes approach , 2008, 2008 19th International Conference on Pattern Recognition.

[30]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[31]  Frank Nielsen,et al.  Simplification and hierarchical representations of mixtures of exponential families , 2010 .

[32]  Tara N. Sainath,et al.  A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Bohyung Han,et al.  Incremental density approximation and kernel-based Bayesian filtering for object tracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Yuval Bistritz,et al.  Discriminative algorithm for compacting mixture models with application to language recognition , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).