Multilingual bottleneck features for language recognition

In this paper, we investigate Multilingual Stacked Bottleneck Features (SBN) in language recognition domain. These features are extracted using bottleneck neural networks trained on data from multiple languages. Previous results have shown benefits of multilingual training of SBN feature extractor for speech recognition. Here we focus on its impact on language recognition. We present results obtained with monolingual and multilingual networks, and their fusions. Using multilingual features, we obtain 16% relative improvement on 3 s condition of NIST LRE09 dataset with respect to features trained on a single language.

[1]  Jan Cernocký,et al.  BUT BABEL system for spontaneous Cantonese , 2013, INTERSPEECH.

[2]  Martin Karafiát,et al.  The language-independent bottleneck features , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[3]  Martin Karafiát,et al.  Further investigation into multilingual training and adaptation of stacked bottle-neck neural network structure , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[4]  Pietro Laface,et al.  On the use of a multilingual neural network front-end , 2008, INTERSPEECH.

[5]  Kyu Jeong Han,et al.  Frame-based phonotactic Language Identification , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[6]  Lukás Burget,et al.  Data selection and calibration issues in automatic language recognition - investigation with BUT-AGNITIO NIST LRE 2009 system , 2010, Odyssey.

[7]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[8]  Lukás Burget,et al.  Investigation into bottle-neck features for meeting speech recognition , 2009, INTERSPEECH.

[9]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Martin Karafiát,et al.  Convolutive Bottleneck Network features for LVCSR , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  Jens Edlund,et al.  A Snack Implementation and Tcl/Tk Interface to the Fundamental Frequency Variation Spectrum Algorithm , 2010, LREC.

[12]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[13]  Hermann Ney,et al.  Improved methods for vocal tract normalization , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Lirong Dai,et al.  Deep Bottleneck Features for Spoken Language Identification , 2014, PloS one.

[15]  Jan Cernocký,et al.  BUT 2014 Babel system: analysis of adaptation in NN based systems , 2014, INTERSPEECH.

[16]  Mireia Díez,et al.  On the Projection of PLLRs for Unbounded Feature Distributions in Spoken Language Recognition , 2014, IEEE Signal Processing Letters.

[17]  Mireia Díez,et al.  PLLR features in language recognition system for RATS , 2014, INTERSPEECH.

[18]  Sri Harish Reddy Mallidi,et al.  Neural Network Bottleneck Features for Language Identification , 2014, Odyssey.

[19]  M. Zissman Automatic Language Identification of Telephone Speech , 1993 .

[20]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[21]  Hynek Hermansky,et al.  Improvements in language identification on the RATS noisy speech corpus , 2013, INTERSPEECH.

[22]  Joaquín González-Rodríguez,et al.  Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).