The 2011 NIST Language Recognition Evaluation

In 2017, the U.S. National Institute of Standards and Technology (NIST) conducted the most recent in an ongoing series of Language Recognition Evaluations (LRE) meant to foster research in robust textand speaker-independent language recognition as well as measure performance of current state-of-the-art systems. LRE17 was organized in a similar manner to LRE15, focusing on differentiating closely related languages (14 in total) drawn from 5 language clusters, namely Arabic, Chinese, English, Iberian, and Slavic. Similar to LRE15, LRE17 offered fixed and open training conditions to facilitate cross-system comparisons, and to understand the impact of additional and unconstrained amounts of training data on system performance, respectively. There were, however, several differences between LRE17 and LRE15 most notably including: 1) use of audio extracted from online videos (AfV) as development and test material, 2) release of a small development set which broadly matched the LRE17 test set, 3) system outputs in form of loglikelihood scores, rather than log-likelihood ratios, and 4) an alternative cross-entropy based performance metric. A total of 25 research organizations, forming 18 teams, participated in this 1-month long evaluation and, combined, submitted 79 valid system outputs to be evaluated. This paper presents an overview of the evaluation and an analysis of system performance over all primary evaluation conditions. The evaluation results suggest that 1) language recognition on AfV data was, in general, more challenging than telephony data, 2) top performing systems exhibited similar performance, 3) greatest performance improvements were largely due to data augmentation and use of more complex models for data representation, and 4) effective use of the development set was essential for the top performing systems.

[1]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[2]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[3]  Mireia Díez,et al.  The Albayzin 2010 Language Recognition Evaluation , 2011, INTERSPEECH.

[4]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[5]  J. Kaiser,et al.  Data smoothing using low‐pass digital filters , 1977 .

[6]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[7]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[8]  Pietro Laface,et al.  Loquendo-Politecnico di Torino system for the 2009 NIST Language Recognition Evaluation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Alvin F. Martin,et al.  The broadcast narrow band speech corpus: a new resource type for large scale language recognition , 2009, INTERSPEECH.

[10]  Alvin F. Martin,et al.  The Current State of Language Recognition: NIST 2005 Evaluation Results , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.