The BLZ Submission to the NIST 2011 LRE: Data Collection, System Development and Performance

This paper describes the most relevant features of a collaborative multi-site submission to the NIST 2011 Language Recognition Evaluation (LRE), consisting of one primary and three contrastive systems, each fusing different combinations of 13 state-of-the-art (acoustic and phonotactic) language recognition subsystems. The collaboration focused on collecting and sharing training data for those target languages for which few development data were provided by NIST, and on defining a common development dataset to train backend and fusion parameters and select the best fusions. Official and post-key results are presented and compared, revealing that the greedy approach applied to select the best fusions provided suboptimal but very competitive performance. Several factors contributed to the high performance attained by BLZ systems, including the availability of training data for low resource target languages, the reliability of the development dataset (consisting only of data audited by NIST), the diversity of modeling approaches, features and datasets in the systems considered for fusion, and the effectiveness of the search for optimal fusions.

[1]  Isabel Trancoso,et al.  The L2F Broadcast News Speech Recognition System , 2010 .

[2]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[4]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[5]  João Paulo da Silva Neto,et al.  The COST278 Pan-European Broadcast News Database , 2004, LREC.

[6]  J. Neto,et al.  The L 2 F Broadcast News Speech Recognition System , 2010 .

[7]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[8]  Mireia Díez,et al.  The Albayzin 2010 Language Recognition Evaluation , 2011, INTERSPEECH.

[9]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Lukás Burget,et al.  Discriminative acoustic language recognition via channel-compensated GMM statistics , 2009, INTERSPEECH.

[12]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[13]  Carmen García-Mateo,et al.  Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Eduardo Lleida,et al.  The BLZ Systems for the 2011 NIST Language Recognition Evaluation , 2011 .