Discriminatively Trained GMMs for Language Classification Using Boosting Methods

In language identification and other speech applications, discriminatively trained models often outperform nondiscriminative models trained with the maximum-likelihood criterion. For instance, discriminative Gaussian mixture models (GMMs) are typically trained by optimizing some discriminative criteria that can be computationally expensive and complex to implement. In this paper, we explore a novel approach to discriminative GMM training by using a variant the boosting framework (R. Schapire, ldquoThe boosting approach to machine learning, an overview,rdquo Proc. MSRI Workshop on Nonlinear Estimation and Classification, 2002) from machine learning, in which an ensemble of GMMs is trained sequentially. We have extended the purview of boosting to class conditional models (as opposed to discriminative models such as classification trees). The effectiveness of our boosting variation comes from the emphasis on working with the misclassified data to achieve discriminatively trained models. Our variant of boosting also includes utilizing low confidence data classifications as well as misclassified examples in classifier generation. We further apply our boosting approach to anti-models to achieve additional performance gains. We have applied our discriminative training approach to a variety of language identification experiments using the 12-language NIST 2003 language identification task. We show the significant performance improvements that can be obtained. The experiments include both acoustic as well as token-based speech models. Our best performing boosted GMM-based system on the 12-language verification task has a 2.3% EER.

[1]  Marc A. Zissman,et al.  Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Herbert Gish,et al.  Discriminatively trained Language Models using Support Vector Machines for Language Identification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[3]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[4]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[7]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[9]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[10]  Herbert Gish,et al.  Evaluation of word confidence for speech recognition systems , 1999, Comput. Speech Lang..

[11]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[12]  W. Chou Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition , 2000, Proc. IEEE.

[13]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[14]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Herbert Gish,et al.  Improved language identification using support vector machines for language modeling , 2006, INTERSPEECH.

[16]  Herbert Gish,et al.  Boosting with anti-models for automatic language identification , 2007, INTERSPEECH.

[17]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[19]  Herbert Gish,et al.  A topic classification system based on parametric trajectory mixture models , 2003, INTERSPEECH.

[20]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[22]  Hagai Aronowitz,et al.  Efficient Language Identification using Anchor Models and Support Vector Machines , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[23]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[24]  Lukás Burget,et al.  Use of Anti-Models to Further Improve State-of-the-Art PRLM Language Recognition System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[26]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Yonghong Yan,et al.  Automatic Language Identification with Discriminative Language Characterization Based on SVM , 2008, IEICE Trans. Inf. Syst..

[29]  Jean-Luc Gauvain,et al.  Discriminative Classifiers for Language Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[31]  Pietro Laface,et al.  Acoustic language identification using fast discriminative training , 2007, INTERSPEECH.