SNR-dependent mixture of PLDA for noise robust speaker verification

This paper proposes a mixture of SNR-dependent PLDA models to provide a wider coverage on the i-vector spaces so that the resulting i-vector/PLDA system can handle test utterances with a wide range of SNR. To maximise the coordination among the PLDA models, they are trained simultaneously via an EM algorithm using utterances contaminated with noise at various levels. The contribution of a training i-vector to individual PLDA models is determined by the posterior probability of the utterance’s SNR. Given a test i-vector, the marginal likelihoods from individual PLDA models are linear combined based on the the posterior probabilities of the test utterance and the targetspeaker’s utterance. Verification scores are the ratio of the marginal likelihoods. Results based on NIST 2012 SRE suggest that this soft-decision scheme is particularly suitable for the situations where the test utterances exhibit a wide range of SNR.

[1]  David A. van Leeuwen,et al.  Knowing the non-target speakers: The effect of the i-vector population for PLDA training in speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  D. V. Leeuwen,et al.  The Radboud University Nijmegen submission to NIST SRE-2012 , 2012 .

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Yun Lei,et al.  A noise robust i-vector extractor using vector taylor series for speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  John H. L. Hansen,et al.  Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition , 2012, INTERSPEECH.

[6]  DeLiang Wang,et al.  Robust speaker identification using auditory features and computational auditory scene analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Tomi Kinnunen,et al.  Effect of multicondition training on i-vector PLDA configurations for speaker recognition , 2013, INTERSPEECH.

[8]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[10]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[11]  The NIST Year 2012 Speaker Recognition Evaluation Plan 1 I , 2022 .

[12]  Man-Wai Mak,et al.  Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Daniel Garcia-Romero,et al.  Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  John H. L. Hansen,et al.  Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  John H. L. Hansen,et al.  CRSS systems for 2012 NIST Speaker Recognition Evaluation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Man-Wai Mak,et al.  A study of voice activity detection techniques for NIST speaker recognition evaluations , 2014, Comput. Speech Lang..

[18]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[20]  Man-Wai Mak,et al.  Construction of discriminative Kernels from known and unknown non-targets for PLDA-SVM scoring , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[22]  Qi Li,et al.  Robust speaker identification using an auditory-based feature , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[25]  David A. van Leeuwen,et al.  Source normalization for language-independent speaker recognition using i-vectors , 2012, Odyssey.

[26]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[27]  John H. L. Hansen,et al.  Acoustic Factor Analysis for Robust Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[29]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers , 2000, International Conference on Machine Learning.