SNR-Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verification

While i-vector/PLDA framework has achieved great success, its performance still degrades dramatically under noisy conditions. To compensate for the variability of i-vectors caused by different levels of background noise, this paper proposes an SNR-invariant PLDA framework for robust speaker verification. First, nonparametric feature analysis (NFA) is employed to suppress intra-speaker variation and emphasize the discriminative information inherited in the boundaries between speakers in the i-vector space. Then, in the NFA-projected subspace, SNR-invariant PLDA is applied to separate the SNR-specific information from speaker-specific information using an identity factor and an SNR factor. Accordingly, a projected i-vector in the NFA subspace can be represented as a linear combination of three components: speaker, SNR, and channel. During verification, the variability due to SNR and channels are integrated out when computing the marginal likelihood ratio. Experiments based on NIST 2012 SRE show that the proposed framework achieves superior performance when compared with the conventional PLDA and SNR-dependent mixture of PLDA.

[1]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[2]  DeLiang Wang,et al.  Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Trans. Audio, Speech & Language Processing.

[3]  Man-Wai Mak,et al.  Comparison of Voice Activity Detectors for Interview Speech in NIST Speaker Recognition Evaluation , 2011, INTERSPEECH.

[4]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Man-Wai Mak,et al.  Fusion of SNR-dependent PLDA models for noise robust speaker verification , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[8]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[9]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[10]  Daniel Garcia-Romero,et al.  Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Driss Matrouf,et al.  Additive noise compensation in the i-vector space for speaker recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Dahua Lin,et al.  Nonparametric Discriminant Analysis for Face Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  John H. L. Hansen,et al.  Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  John H. L. Hansen,et al.  CRSS systems for 2012 NIST Speaker Recognition Evaluation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[16]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[17]  Patrick Kenny,et al.  Improvements in Factor Analysis Based Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  Man-Wai Mak,et al.  SNR-dependent mixture of PLDA for noise robust speaker verification , 2014, INTERSPEECH.

[19]  Man-Wai Mak,et al.  SNR-invariant PLDA modeling for robust speaker verification , 2015, INTERSPEECH.

[20]  Yun Lei,et al.  A noise robust i-vector extractor using vector taylor series for speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Driss Matrouf,et al.  Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis , 2012, Odyssey.

[22]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[23]  Man-Wai Mak,et al.  A study of voice activity detection techniques for NIST speaker recognition evaluations , 2014, Comput. Speech Lang..

[24]  Patrick Kenny,et al.  Comparison between factor analysis and GMM support vector machines for speaker verification , 2008, Odyssey.

[25]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[26]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[27]  龚迪洪 Hidden Factor Analysis for Age Invariant Face Recognition , 2013 .

[28]  Simon J. D. Prince,et al.  Computer Vision: Index , 2012 .

[29]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[30]  David A. van Leeuwen,et al.  Knowing the non-target speakers: The effect of the i-vector population for PLDA training in speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[32]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[33]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[34]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[35]  Seyed Omid Sadjadi,et al.  Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Simon J. D. Prince,et al.  Computer Vision: Models, Learning, and Inference , 2012 .

[38]  David A. van Leeuwen,et al.  Source normalization for language-independent speaker recognition using i-vectors , 2012, Odyssey.

[39]  John H. L. Hansen,et al.  An Investigation into Back-end Advancements for Speaker Recognition in Multi-Session and Noisy Enrollment Scenarios , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[40]  The NIST Year 2012 Speaker Recognition Evaluation Plan 1 I , 2022 .

[41]  Driss Matrouf,et al.  Intersession Compensation and Scoring Methods in the i-vectors Space for Speaker Recognition , 2011, INTERSPEECH.

[42]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[43]  Simão Ferraz de Campos Neto The ITU-T Software Tool Library , 1999, Int. J. Speech Technol..

[44]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[45]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[46]  David A. van Leeuwen,et al.  The effect of noise on modern automatic speaker recognition systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[48]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[49]  Tomi Kinnunen,et al.  Effect of multicondition training on i-vector PLDA configurations for speaker recognition , 2013, INTERSPEECH.

[50]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[51]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[52]  John H. L. Hansen,et al.  Acoustic Factor Analysis for Robust Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[53]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..