SNR-invariant PLDA modeling for robust speaker verification

In spite of the great success of the i-vector/PLDA framework, speaker verification in noisy environments remains a challenge. To compensate for the variability of i-vectors caused by different levels of background noise, this paper proposes a new framework, namely SNR-invariant PLDA, for robust speaker verification. By assuming that i-vectors extracted from utterances falling within a narrow SNR range share similar SNRspecific information, the paper introduces an SNR factor to the conventional PLDA model. Then, the SNR-related variability and the speaker-related variability embedded in the i-vectors are modeled by the SNR factor and the speaker factor, respectively. Accordingly, an i-vector is represented by a linear combination of three components: speaker, SNR, and channel. During verification, the variability due to SNR and channels are marginalized out when computing the marginal likelihood ratio. Experiments based on NIST 2012 SRE show that SNR-invariant PLDA achieves superior performance when compared with the conventional PLDA and SNR-dependent mixture of PLDA.

[1]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[3]  Seyed Omid Sadjadi,et al.  Nearest neighbor discriminant analysis for robust speaker recognition , 2014, INTERSPEECH.

[4]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Daniel Garcia-Romero,et al.  Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  John H. L. Hansen,et al.  CRSS systems for 2012 NIST Speaker Recognition Evaluation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  The NIST Year 2012 Speaker Recognition Evaluation Plan 1 I , 2022 .

[8]  Man-Wai Mak,et al.  SNR-Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[10]  龚迪洪 Hidden Factor Analysis for Age Invariant Face Recognition , 2013 .

[11]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[12]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[13]  Sourjya Sarkar,et al.  A novel boosting algorithm for improved i-vector based speaker verification in noisy environments , 2014, INTERSPEECH.

[14]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Dahua Lin,et al.  Nonparametric Discriminant Analysis for Face Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  DeLiang Wang,et al.  Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Trans. Audio, Speech & Language Processing.

[17]  Man-Wai Mak,et al.  Comparison of Voice Activity Detectors for Interview Speech in NIST Speaker Recognition Evaluation , 2011, INTERSPEECH.

[18]  David A. van Leeuwen,et al.  Knowing the non-target speakers: The effect of the i-vector population for PLDA training in speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Man-Wai Mak,et al.  SNR-dependent mixture of PLDA for noise robust speaker verification , 2014, INTERSPEECH.

[20]  John H. L. Hansen,et al.  Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition , 2012, INTERSPEECH.

[21]  DeLiang Wang,et al.  Robust speaker identification using auditory features and computational auditory scene analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[23]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Simon J. D. Prince,et al.  Computer Vision: Models, Learning, and Inference , 2012 .

[25]  David A. van Leeuwen,et al.  Source normalization for language-independent speaker recognition using i-vectors , 2012, Odyssey.

[26]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[27]  Tomi Kinnunen,et al.  Effect of multicondition training on i-vector PLDA configurations for speaker recognition , 2013, INTERSPEECH.

[28]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[30]  David A. van Leeuwen,et al.  The effect of noise on modern automatic speaker recognition systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[33]  Qi Li,et al.  Robust speaker identification using an auditory-based feature , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Man-Wai Mak,et al.  A study of voice activity detection techniques for NIST speaker recognition evaluations , 2014, Comput. Speech Lang..