The BLZ Systems for the 2011 NIST Language Recognition Evaluation

This paper briefly describes the language recognition systems developed for the 2011 NIST Language Recognition Evaluation (LRE) by the BLZ (Bilbao-Lisboa-Zaragoza) team, a threesite joint including GTTS from the University of the Basque Country (Spain), L 2 F (Spoken Language Systems Lab) from INESC-ID Lisboa (Portugal) and I3A from the University of Zaragoza (Spain). The primary system fuses 8 (3 acoustic + 5 phonotactic) subsystems: a Linearized Eigenchannel GMM (LE-GMM) subsystem, a JFA subsystem, an iVector subsystem, three Phone-SVM subsystems using the Brno University of Technology phone decoders for Czech, Hungarian and Russian, and two Phone-SVM subsystems using the L 2 F phone decoders for European Portuguese and Brazilian Portuguese. Gaussian backends and multiclass fusion have been applied to get the final scores. Three contrastive systems have been also submitted, featuring: (1) the fusion of the whole set of 13 (6 acoustic + 7 phonotactic) subsystems; (2) the fusion of 3 subsystems, for the combination of one subsystem per site yielding the best performance on development data; and (3) the fusion of the same 8 subsystems used in the primary system under a different configuration.

[1]  Isabel Trancoso,et al.  The L2F Broadcast News Speech Recognition System , 2010 .

[2]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[3]  Mikel Penagarikano,et al.  University of the Basque Country (EHU) Systems for the 2011 NIST Language Recognition Evaluation , 2011 .

[4]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[5]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[6]  João Paulo da Silva Neto,et al.  The COST278 Pan-European Broadcast News Database , 2004, LREC.

[7]  Alberto Abad The L 2 F Language Recognition System for NIST LRE 2011 , 2011 .

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  William M. Campbell,et al.  Language recognition with discriminative keyword selection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[12]  Mireia Díez,et al.  The Albayzin 2010 Language Recognition Evaluation , 2011, INTERSPEECH.

[13]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[15]  Alvin F. Martin,et al.  The 2011 NIST Language Recognition Evaluation , 2010, INTERSPEECH.

[16]  David A. van Leeuwen,et al.  On calibration of language recognition scores , 2006, Odyssey.

[17]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Lukás Burget,et al.  Discriminative acoustic language recognition via channel-compensated GMM statistics , 2009, INTERSPEECH.

[19]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.