Language-aware PLDA for multilingual speaker recognition

Multilingual speaker recognition involves multilingual speech data in model training, which empowers the system to handle recognition requests in multiple languages. The multilingual training approach augments data from multiple languages, but inevitably introduces probability dispersion, due to the more complex language conditions. This paper proposes a language-aware training approach for PLDA which involves language information when training the PLDA model. The proposed approach has been evaluated with the i-vector/PLDA framework using the CSLT-CUDGT2014 Chinese-Uyghur bilingual speech database. The experimental results show that the language PLDA training resulted in a relative EER reduction of 15.38% in the Chinese test and 20.07% in the Uyghur test.

[1]  John H. L. Hansen,et al.  Spoken language mismatch in speaker verification: An investigation with NIST-SRE and CRSS Bi-Ling corpora , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  Lukás Burget,et al.  Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Thomas Fang Zheng,et al.  Cross-lingual speaker verification based on linear transform , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[4]  Roland Auckenthaler,et al.  Language dependency in text-independent speaker verification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[8]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Bin Ma,et al.  English-Chinese bilingual text-independent speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[11]  Bin Ma,et al.  Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  John H. L. Hansen,et al.  Acoustic Factor Analysis for Robust Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Haizhou Li,et al.  I-vectors in the context of phonetically-constrained short utterances for speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[15]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..