Analysis of BUT-PT Submission for NIST LRE 2017

In this paper, we summarize our efforts in the NIST Language Recognition Evaluations (LRE) 2017 which resulted in systems providing very competitive and state-of-the-art performance. We provide both the descriptions and the analysis of the systems that we included in our submission. We explain our partitioning of the datasets that we were provided by NIST for training and development, and we follow by describing the features, DNN models and classifiers that were used to produce the final systems. After covering the architecture of our submission, we concentrate on post-evaluation analysis. We compare different DNN Bottle-Neck features, i-vector systems of different sizes and architectures, different classifiers and we present experimental results with data augmentation and with improved architecture of the system based on DNN embeddings. We present the performance of the systems in the Fixed condition (where participants are required to use only predefined data sets) and in addition to official NIST LRE17 evaluation set, we also provide results on our internal development set which can serve as a baseline for other researchers, since all training data are fixed and provided by NIST.

[1]  Joaquín González-Rodríguez,et al.  DNN Based Embeddings for Language Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Lukás Burget,et al.  BAT System Description for NIST LRE 2015 , 2016, Odyssey.

[3]  Joaquín González-Rodríguez,et al.  Analysis of DNN-based Embeddings for Language Recognition on the NIST LRE 2017 , 2018, Odyssey.

[4]  Frantisek Grézl,et al.  Multilingually trained bottleneck features in spoken language recognition , 2017, Comput. Speech Lang..

[5]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[6]  Patrick Kenny,et al.  Deep Speaker Embeddings for Short-Duration Speaker Verification , 2017, INTERSPEECH.

[7]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[8]  Hans-Günter Hirsch,et al.  The simulation of realistic acoustic input scenarios for speech recognition systems , 2005, INTERSPEECH.

[9]  Sandro Cumani,et al.  Exploiting i-vector posterior covariances for short-duration language recognition , 2015, INTERSPEECH.

[10]  Sanjeev Khudanpur,et al.  Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.

[11]  Sanjeev Khudanpur,et al.  A pitch extraction algorithm tuned for automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Pietro Laface,et al.  Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Erik McDermott,et al.  Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Arthur Pewsey,et al.  Skew t distributions via the sinh-arcsinh transformation , 2011 .

[15]  Lukás Burget,et al.  BUT/Phonexia Bottleneck Feature Extractor , 2018, Odyssey.

[16]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[17]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[18]  M. C. Jones,et al.  Sinh-arcsinh distributions , 2009 .

[19]  Yan Song,et al.  i-vector representation based on bottleneck features for language identification , 2013 .

[20]  Pietro Laface,et al.  Joint Estimation of PLDA and Nonlinear Transformations of Speaker Vectors , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Martin Karafiát,et al.  The language-independent bottleneck features , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[22]  Joaquín González-Rodríguez,et al.  Automatic language identification using long short-term memory recurrent neural networks , 2014, INTERSPEECH.

[23]  Sri Harish Reddy Mallidi,et al.  Neural Network Bottleneck Features for Language Identification , 2014, Odyssey.