Language score calibration using adapted Gaussian back-end

Generative Gaussian back-end and discriminative logistic regression are the most used approaches for language score fusion and calibration. Combination of these two approaches can significantly improve the performance. This paper proposes the use of an adapted Gaussian back-end, where the mean of the language-dependent Gaussian is adapted from the mean of a language-specific background Gaussian via maximum a posteriori estimation algorithm. Experiments are conducted using the LRE-07 evaluation data. Compared to the conventional Gaussian back-end approach for a closed set task, relative improvements in the Cavg of 50%, 17% and 4.2% are obtained on the 30s, 10s and 3s conditions, respectively. Besides this, the estimated scores are better calibrated. A combination with logistic regression results in a system with the best calibrated scores. Index Terms: Language recognition, Gaussian back-end, Adaptation

[1]  Yonghong Yan,et al.  Using SVM as Back-End Classifier for Language Identification , 2008, EURASIP J. Audio Speech Music. Process..

[2]  Jean-Luc Gauvain,et al.  Context-dependent phone models and models adaptation for phonotactic language recognition , 2008, INTERSPEECH.

[3]  Marc A. Zissman,et al.  Predicting, diagnosing and improving automatic language identification performance , 1997, EUROSPEECH.

[4]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[5]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[6]  Jean-Luc Gauvain,et al.  Gaussian Backend design for open-set language detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[8]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[9]  David A. van Leeuwen,et al.  Channel-dependent GMM and Multi-class Logistic Regression models for language recognition , 2006, Odyssey.

[10]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[11]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  David A. van Leeuwen,et al.  On calibration of language recognition scores , 2006, Odyssey.

[13]  Lukás Burget,et al.  BUT language recognition system for NIST 2007 evaluations , 2008, INTERSPEECH.

[14]  Lukás Burget,et al.  Advances in phonotactic language recognition , 2008, INTERSPEECH.

[15]  N. Brummer,et al.  Channel-dependent GMM and Multi-class Logistic Regression models for language recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.