Analysis and Description of ABC Submission to NIST SRE 2016

We present a condensed description and analysis of the joint submission for NIST SRE 2016, by Agnitio, BUT and CRIM (ABC). We concentrate on challenges that arose during development and we analyze the results obtained on the evaluation data and on our development sets. We show that testing on mismatched, non-English and short duration data introduced in NIST SRE 2016 is a difficult problem for current state-of-theart systems. Testing on this data brought back the issue of score normalization and it also revealed that the bottleneck features (BN), which are superior when used for telephone English, are lacking in performance against the standard acoustic features like Mel Frequency Cepstral Coefficients (MFCCs). We offer ABC’s insights, findings and suggestions for building a robust system suitable for mismatched, non-English and relatively noisy data such as those in NIST SRE 2016.

[1]  D. Karlis An EM type algorithm for maximum likelihood estimation of the normal-inverse Gaussian distribution , 2002 .

[2]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[3]  Joaquín González-Rodríguez,et al.  Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[5]  Yan Song,et al.  i-vector representation based on bottleneck features for language identification , 2013 .

[6]  Sri Harish Reddy Mallidi,et al.  Neural Network Bottleneck Features for Language Identification , 2014, Odyssey.

[7]  Patrick Kenny,et al.  Modelling speaker and channel variability using deep neural networks for robust speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[8]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Seyed Omid Sadjadi,et al.  The IBM 2016 Speaker Recognition System , 2016, Odyssey.

[10]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Niko Brümmer,et al.  A comparison of linear and non-linear calibrations for speaker recognition , 2014, Odyssey.

[12]  Lukás Burget,et al.  Analysis of Score Normalization in Multilingual Speaker Recognition , 2017, INTERSPEECH.

[13]  Lukás Burget,et al.  Migrating i-vectors between speaker recognition systems using regression neural networks , 2015, INTERSPEECH.

[14]  Pavel Matejka,et al.  Multilingual bottleneck features for language recognition , 2015, INTERSPEECH.

[15]  Douglas A. Reynolds,et al.  A unified deep neural network for speaker and language recognition , 2015, INTERSPEECH.

[16]  T. Minka Estimating a Dirichlet distribution , 2012 .

[17]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[18]  Pietro Laface,et al.  On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Lukás Burget,et al.  Analysis of DNN approaches to speaker identification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Lukás Burget,et al.  Analysis of the DNN-based SRE systems in multi-language conditions , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[22]  Pietro Laface,et al.  Pairwise Discriminative Speaker Verification in the ${\rm I}$-Vector Space , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Niko Brümmer,et al.  A Generative Model for Score Normalization in Speaker Recognition , 2017, INTERSPEECH.

[24]  Alan McCree,et al.  Improving speaker recognition performance in the domain adaptation challenge using deep neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).