Domain adaptation using maximum likelihood linear transformation for PLDA-based speaker verification

While i-vector-PLDA frameworks employing huge amounts of development data have achieved significant success in speaker recognition, it is infeasible to collect a sufficiently large amount of data for every real application. This paper proposes a method to perform supervised domain adaptation of PLDA in i-vector-based speaker recognition systems with available resource-rich mismatched data and small amounts of matched data, under two assumptions: (1) between-speaker and within-speaker covariances depend on domains; (2) features in one domain can be transformed into another domain by means of an affine transformation. Maximum likelihood linear transformation (MLLT) is used to infer the relationship between the datasets of two domains in training PLDA. The proposed method improves performance over that achieved without adaptation. Using a score fusion technique, it outperforms a conventional method based on linear combination.

[1]  Spyridon Matsoukas,et al.  Domain adaptation via within-class covariance correction in I-vector based speaker recognition systems , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Hagai Aronowitz,et al.  Inter dataset variability compensation for speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Alan McCree,et al.  Improving speaker recognition performance in the domain adaptation challenge using deep neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[5]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[6]  Niko Brümmer,et al.  Unsupervised Domain Adaptation for I-Vector Speaker Recognition , 2014, Odyssey.

[7]  John H. L. Hansen,et al.  Spoken language mismatch in speaker verification: An investigation with NIST-SRE and CRSS Bi-Ling corpora , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[8]  Sridha Sridharan,et al.  Investigating in-domain data requirements for PLDA training , 2015, INTERSPEECH.

[9]  Eduardo Lleida,et al.  Unsupervised adaptation of PLDA by using variational Bayes methods , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Lukás Burget,et al.  Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[12]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[13]  Douglas A. Reynolds,et al.  Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems , 2014, Odyssey.

[14]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[15]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Alan McCree,et al.  Supervised domain adaptation for I-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Niko Brümmer,et al.  Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance , 2011, INTERSPEECH.

[18]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .