THUEE System for NIST SRE19 CTS Challenge

In this paper, we present the system that THUEE submitted to NIST 2019 Speaker Recognition Evaluation CTS Challenge (SRE19). Similar to the previous SREs, domain mismatches, such as cross-lingual and cross-channel between the training sets and evaluation sets, remain the major challenges in this evaluation. To improve the robustness of our systems, we develop deeper and wider x-vector architectures. Besides, we use novel speaker discriminative embedding systems, hybrid multitask learning architectures combined with phonetic information. To deal with domain mismatches, we follow a heuristic search scheme to select the best back-end strategy based on limited development corpus. An extended and factorized TDNN achieves the best single-system results on SRE18 DEV and SRE19 EVAL sets. The final system is a fusion of six subsystems, which yields EER 2.81% and minimum cost 0.262 on the SRE19 EVAL set.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[3]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Sanjeev Khudanpur,et al.  Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.

[5]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[6]  Jia Liu,et al.  Large Margin Softmax Loss for Speaker Verification , 2019, INTERSPEECH.

[7]  Sanjeev Khudanpur,et al.  Speaker Recognition for Multi-speaker Conversations Using X-vectors , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Lukás Burget,et al.  Analysis of Score Normalization in Multilingual Speaker Recognition , 2017, INTERSPEECH.

[9]  Jia Liu,et al.  Introducing phonetic information to speaker embedding for speaker verification , 2019, EURASIP Journal on Audio, Speech, and Music Processing.

[10]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Galina Lavrentyeva,et al.  STC Speaker Recognition Systems for the VOiCES From a Distance Challenge , 2019, INTERSPEECH.

[12]  Liang He,et al.  Multi-objective Optimization Training of PLDA for Speaker Verification , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Daniel Povey,et al.  MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.

[14]  Yiming Wang,et al.  Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.

[15]  Niko Brümmer,et al.  The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF , 2013, ArXiv.

[16]  Jia Liu,et al.  Local Pairwise Linear Discriminant Analysis for Speaker Verification , 2018, IEEE Signal Processing Letters.

[17]  Yi Liu,et al.  Speaker Embedding Extraction with Phonetic Information , 2018, INTERSPEECH.

[18]  Ming Li,et al.  Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System , 2018, Odyssey.

[19]  Dong Wang,et al.  Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Alan McCree,et al.  Supervised domain adaptation for I-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  John H. L. Hansen,et al.  I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences , 2019, INTERSPEECH.

[23]  Alan McCree,et al.  The JHU-MIT System Description for NIST SRE18 , 2019 .