论文信息 - Results of The 2015 NIST Language Recognition Evaluation

Results of The 2015 NIST Language Recognition Evaluation

In 2015, NIST conducted the most recent in an ongoing series of Language Recognition Evaluations (LRE) meant to foster research in language recognition. The 2015 Language Recognition Evaluation featured 20 target languages grouped into 6 language clusters. The evaluation was focused on distinguishing languages within each cluster, without disclosing which cluster a test language belongs to. The 2015 evaluation introduced several new aspects, such as using limited and specified training data and a wider range of durations for test segments. Unlike in past LRE’s, systems were not required to output hard decisions for each test language and test segment, instead systems were required to provide a vector of log likelihood ratios to indicate the likelihood a test segment matches a target language. A total of 24 research organizations participated in this four-month long evaluation and combined they submitted 167 systems to be evaluated. The evaluation results showed that top-performing systems exhibited similar performance and there were wide variations in performance based on language clusters and within cluster language pairs. Among the 6 clusters, the French cluster was the hardest to recognize, with near random performance, and the Slavic cluster was the easiest to recognize.

[1] David Miller,et al. The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data , 2004, LREC.

[2] Jacob Benesty,et al. Springer handbook of speech processing , 2007, Springer Handbooks.

[3] Alvin F. Martin,et al. The broadcast narrow band speech corpus: a new resource type for large scale language recognition , 2009, INTERSPEECH.

[4] Alvin F. Martin,et al. NIST Language Recognition Evaluation - Past and Future , 2014, Odyssey.

[5] Alvin F. Martin,et al. The 2011 NIST Language Recognition Evaluation , 2010, INTERSPEECH.

[6] Christopher Cieri,et al. New resources for recognition of confusable linguistic varieties: the LRE11 corpus , 2012, Odyssey.

[7] Mark Liberman,et al. The creation,distribution and use of linguistic data: the case of the linguistic data consortium , 1998 .