Boosting with anti-models for automatic language identification

In this paper, we adopt the boosting framework to improve the performance of acoustic-based Gaussian mixture model (GMM) Language Identification (LID) systems. We introduce a set of low-complexity, boosted target and anti-models that are estimated from training data to improve class separation, and these models are integrated during the LID backend process. This results in a fast estimation process. Experiments were performed on the 12-language, NIST 2003 language recognition evaluation classification task using a GMM-acoustic-score-only LID system, as well as the one that combines GMM acoustic scores with sequence language model scores from GMM tokenization. Classification errors were reduced from 18.8% to 10.5% on the acoustic-score-only system, and from 11.3% to 7.8% on the combined acoustic and tokenization system.

[1]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[2]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[3]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Anthony J. Robinson,et al.  Boosting the performance of connectionist large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Marc A. Zissman,et al.  Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Carsten Meyer Utterance-level boosting of HMM speech recognizers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Hagai Aronowitz,et al.  Efficient Language Identification using Anchor Models and Support Vector Machines , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[12]  Herbert Gish,et al.  Discriminatively trained Language Models using Support Vector Machines for Language Identification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[13]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[14]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[16]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[17]  Harry Shum,et al.  Learning to boost GMM based speaker verification , 2003, INTERSPEECH.

[18]  Herbert Gish,et al.  Improved language identification using support vector machines for language modeling , 2006, INTERSPEECH.

[19]  Samy Bengio,et al.  Boosting HMMs with an application to speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Lukás Burget,et al.  Use of Anti-Models to Further Improve State-of-the-Art PRLM Language Recognition System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.