论文信息 - Boosting with anti-models for automatic language identification

Boosting with anti-models for automatic language identification

In this paper, we adopt the boosting framework to improve the performance of acoustic-based Gaussian mixture model (GMM) Language Identification (LID) systems. We introduce a set of low-complexity, boosted target and anti-models that are estimated from training data to improve class separation, and these models are integrated during the LID backend process. This results in a fast estimation process. Experiments were performed on the 12-language, NIST 2003 language recognition evaluation classification task using a GMM-acoustic-score-only LID system, as well as the one that combines GMM acoustic scores with sequence language model scores from GMM tokenization. Classification errors were reduced from 18.8% to 10.5% on the acoustic-score-only system, and from 11.3% to 7.8% on the combined acoustic and tokenization system.

[1] Robert E. Schapire,et al. The strength of weak learnability , 1990, Mach. Learn..

[2] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[3] Bin Ma,et al. A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4] Anthony J. Robinson,et al. Boosting the performance of connectionist large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[6] James R. Glass. A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[7] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8] Douglas A. Reynolds,et al. Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Marc A. Zissman,et al. Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Carsten Meyer. Utterance-level boosting of HMM speech recognizers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Hagai Aronowitz,et al. Efficient Language Identification using Anchor Models and Support Vector Machines , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[12] Herbert Gish,et al. Discriminatively trained Language Models using Support Vector Machines for Language Identification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[13] Y. Freund,et al. Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[14] Lukás Burget,et al. Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[16] William M. Campbell,et al. Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[17] Harry Shum,et al. Learning to boost GMM based speaker verification , 2003, INTERSPEECH.

[18] Herbert Gish,et al. Improved language identification using support vector machines for language modeling , 2006, INTERSPEECH.

[19] Samy Bengio,et al. Boosting HMMs with an application to speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20] Lukás Burget,et al. Use of Anti-Models to Further Improve State-of-the-Art PRLM Language Recognition System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.