论文信息 - Gaussian Mixture Models Reduction by Variational Maximum Mutual Information

Gaussian Mixture Models Reduction by Variational Maximum Mutual Information

Gaussian mixture models (GMMs) are widely used in a variety of classification tasks where it is often important to approximate high order models by models with fewer components. The paper proposes a novel approach to this problem based on a parametric realization of the maximum mutual information (MMI) criterion and its approximation by a closed-form expression named variational-MMI (VMMI). The maximization of the VMMI can be carried out in an analytically tractable manner and it aims at improving the discrimination ability of the reduced set of models, a goal that was not targeted in previous approaches that simplify each class-related GMM independently. Two effective algorithms are proposed and studied for the optimization of the VMMI criterion. One is a steepest descent type algorithm, and the other, called line search A-functions (LSAF), uses concave associated functions. Experiments held in two speech related tasks, phone recognition and language recognition, demonstrate that the VMMI-based parametric model reduction algorithms significantly outperform previous non-discriminative methods. According to these experiments, the EM-like LSAF-based algorithm requires less iterations and converges to a better value of the objective function compared to the steepest descent algorithm.

Yuval Bistritz | Yossi Bar-Yosef

[1] Salvatore D. Morgera,et al. An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[3] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[4] Nuno Vasconcelos,et al. Learning Mixture Hierarchies , 1998, NIPS.

[5] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8] Shun-ichi Amari,et al. A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[9] Alvin F. Martin,et al. NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[10] Nebojsa Jojic,et al. Recursive estimation of generative models of video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11] Hayit Greenspan,et al. Simplifying Mixture Models Using the Unscented Transform , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] John R. Hershey,et al. Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13] Tanja Schultz,et al. Generalized Baum-Welch Algorithm and its Implication to a New Extended Baum-Welch Algorithm , 2011, INTERSPEECH.

[14] Frank Nielsen,et al. Simplifying Gaussian mixture models via entropic quantization , 2009, 2009 17th European Signal Processing Conference.

[15] Wu Chou,et al. Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[16] Scott Axelrod,et al. Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[18] Jacob Goldberger,et al. Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[19] A. Nadas,et al. Decoder selection based on cross-entropies , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[20] David Burshtein,et al. A discriminative training algorithm for hidden Markov models , 2004, IEEE Transactions on Speech and Audio Processing.

[21] Justin Dauwels,et al. Message-passing decoding of lattices using Gaussian mixtures , 2008, 2008 IEEE International Symposium on Information Theory.

[22] Haizhou Li,et al. Optimization Algorithms and Applications for Speech and Language Processing , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23] Vaibhava Goel,et al. Refactoring acoustic models using variational expectation-maximization , 2009, INTERSPEECH.

[24] Yuval Bistritz,et al. Discriminative simplification of mixture models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[26] Andreas Stolcke,et al. Improved maximum mutual information estimation training of continuous density HMMs , 2001, INTERSPEECH.

[27] James T. Kwok,et al. Simplifying Mixture Models Through Function Approximation , 2006, IEEE Transactions on Neural Networks.

[28] Biing-Hwang Juang,et al. Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[29] Pierrick Bruneau,et al. Parameter-based reduction of Gaussian mixture models with a variational-Bayes approach , 2008, 2008 19th International Conference on Pattern Recognition.

[30] Daniel Povey,et al. Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[31] Frank Nielsen,et al. Simplification and hierarchical representations of mixtures of exponential families , 2010 .

[32] Tara N. Sainath,et al. A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33] Bohyung Han,et al. Incremental density approximation and kernel-based Bayesian filtering for object tracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35] Yuval Bistritz,et al. Discriminative algorithm for compacting mixture models with application to language recognition , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).