Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance

We propose a variational Bayes solution to integrate out the model parameters in a generative i-vector speaker recognizer. The existing state-of-the-art in generative i-vector modelling plugs in fixed maximum-likelihood point-estimates of model parameters. This recipe may suffer from over-fitting of especially the between-speaker covariance. We show how to integrate out the between-speaker covariance and demonstrate dramatic improvements on NIST SRE 2010.

[1]  Lukás Burget,et al.  Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  T. Minka Inferring a Gaussian distribution , 2001 .

[6]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[7]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[9]  Patrick Kenny,et al.  An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech , 2010, Odyssey.

[10]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[11]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.