Study of Different Backends in a State-Of-the-Art Language Recognition System

State of the art language recognition systems usually add a backend prior to the linear fusion of the subsystems scores. The backend plays a dual role. When the set of languages for which models have been trained does not match the set of target languages, the backend maps the available scores to the space of target languages. On the other hand, the backend serves as a precalibration stage that adapts the space of scores. In this work, well known backends (Generative Gaussian Backend, Discriminative Gaussian Backend and Logistic Regression Backend) and newer proposals (Fully Bayesian Gaussian Backend and Gaussian Mixture Backend) are analyzed and compared. The effect of applying a T-Norm or a ZT-Norm is also analyzed. Finally the effect of discarding development signals, those with the highest scores, is also studied. Experiments have been carried out on the NIST 2009 LRE database, using a state-of-theart Language Recognition System consisting of the fusion of five subsystems: A Linearized Eigenchannel GMM (LE-GMM) subsystem, an iVector subsystem and three phone-lattice-SVM subsystems. Best performance was attained by Gaussian Mixture Backend (1.25 EER), yielding 23% relative improvement with respect to the baseline (1.62 EER).

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Niko Brummer Generative, Fully Bayesian, Gaussian, Openset Pattern Classifier , 2013 .

[3]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[4]  Lukás Burget,et al.  Data selection and calibration issues in automatic language recognition - investigation with BUT-AGNITIO NIST LRE 2009 system , 2010, Odyssey.

[5]  Lukás Burget,et al.  Discriminative acoustic language recognition via channel-compensated GMM statistics , 2009, INTERSPEECH.

[6]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[7]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[8]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9]  William M. Campbell,et al.  Language recognition with discriminative keyword selection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[11]  Jean-Luc Gauvain,et al.  Language score calibration using adapted Gaussian back-end , 2009, INTERSPEECH.

[12]  Alvin F. Martin,et al.  The 2011 NIST Language Recognition Evaluation , 2010, INTERSPEECH.

[13]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.