Unsupervised adaptation of PLDA by using variational Bayes methods

State-of-the-art speaker recognition relays on models that need a large amount of training data. This models are successful in tasks like NIST SRE because there is sufficient data available. However, in real applications, we usually do not have so much data and, in many cases, the speaker labels are unknown. We present a method to adapt a PLDA model from a domain with a large amount of labeled data to another with unlabeled data. We describe a generative model that produces both sets of data where the unknown labels are modeled like latent variables. We used variational Bayes to estimate the hidden variables. We performed experiments adapting a model trained on Switchboard to NIST SRE without labels. The adapted model is evaluated on NIST SRE10. Compared to the non-adapted model, EER improved by 42% and 49% by adapting with 200 and with all the NIST speakers respectively.

[1]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[2]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  K. Katahira,et al.  Deterministic annealing variant of variational Bayes method , 2008 .

[4]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Alvin F. Martin,et al.  NIST speaker recognition evaluations 1996-2008 , 2009, Defense + Commercial Sensing.

[6]  Carlos Vaquero,et al.  Dataset shift in PLDA based speaker verification , 2012, Odyssey.

[7]  Eduardo Lleida,et al.  Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data , 2012, Odyssey.

[8]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[9]  Jes'us Villalba Unsupervised Adaptation of SPLDA , 2015 .

[10]  Eduardo Lleida,et al.  The I3a speaker recognition system for NIST SRE12: post-evaluation analysis , 2013, INTERSPEECH.

[11]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[12]  Charles M. Bishop Variational principal components , 1999 .

[13]  Niko Brümmer,et al.  Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance , 2011, INTERSPEECH.