Language Recognition for Dialects and Closely Related Languages

This paper describes our development work to design a language recognition system that can discriminate closely related languages and dialects of the same language. The work was a joint effort by LIMSI and Vocapia Research in preparation for the NIST 2015 Language Recognition Evaluation (LRE). The language recognition system results from a fusion of four core classifiers: a phonotactic component using DNN acoustic models, two purely acoustic components using a RNN model and i-vector model, and a lexical component. Each component generates language posterior probabilities optimized to maximize the LID NCE, making their combination simple and robust. The motivation for using multiple components representing different speech knowledge is that some dialect distinctions may not be manifest at the acoustic level. We report experiments on the NIST LRE15 data and provide an analysis of the results and some post-evaluation contrasts. The 2015 LRE task focused on the identification of 20 languages clustered in 6 groups (Arabic, Chinese, English, French, Slavic and Iberic) of similar languages. Results are reported using the NIST Cavg metric which served as the primary metric for the OpenLRE15 evaluation. Results are also reported for the EER and the LER.

[1]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[2]  Shubha Kadambe,et al.  Language identification with phonological and lexical models , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jean-Luc Gauvain,et al.  Language identification using phone-based acoustic likelihoods , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[5]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[6]  Jean-Luc Gauvain,et al.  Minimum word error training of RNN-based voice activity detection , 2015, INTERSPEECH.

[7]  Jean-Luc Gauvain,et al.  Language identification incorporating lexical information , 1998, ICSLP.

[8]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[9]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[10]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[11]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[12]  John H. L. Hansen,et al.  An i-Vector PLDA based gender identification approach for severely distorted and multilingual DARPA RATS data , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[13]  Sanjeev Khudanpur,et al.  Audio augmentation for speech recognition , 2015, INTERSPEECH.

[14]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[15]  Jean-Luc Gauvain,et al.  Phonotactic Language Recognition Using MLP Features , 2012, INTERSPEECH.

[16]  Martine Adda-Decker,et al.  Language identification using lattice-based phonotactic and syllabotactic approaches , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[17]  Jean-Luc Gauvain,et al.  Fusing language information from diverse data sources for phonotactic language recognition , 2012, Odyssey.

[18]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[19]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Jean-Luc Gauvain,et al.  Improved n-gram phonotactic models for language recognition , 2010, INTERSPEECH.

[22]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Jean-Luc Gauvain,et al.  Identifying non-linguistic speech features , 1993, EUROSPEECH.

[24]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .