A transfer learning method for PLDA-based speaker verification

Currently, the state-of-the-art speaker verification system is based on i-vector and PLDA. However, PLDA requires tens of thousands of development data from many speakers. This makes it difficult to learn the PLDA parameters for a domain with scarce data. In this paper, we propose an effective transfer learning method based on Bayesian joint probability in which Kullback-Leibler (KL) divergence between the source domain and the target domain is added as a regularization factor. This hypothesis could utilize the development data of source domain to help find a better optimal solution of PLDA parameters for the target domain. Experimental results based on the NIST SRE and Switchboard corpus demonstrate that our proposed method could produce the largest gain of performance compared with the traditional PLDA and the other adaptation approach.

[1]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[2]  Lin Li,et al.  Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system , 2015, INTERSPEECH.

[3]  Pietro Laface,et al.  Pairwise Discriminative Speaker Verification in the ${\rm I}$-Vector Space , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Niko Brümmer,et al.  Unsupervised Domain Adaptation for I-Vector Speaker Recognition , 2014, Odyssey.

[5]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jian Sun,et al.  A Practical Transfer Learning Algorithm for Face Verification , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Eduardo Lleida,et al.  Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data , 2012, Odyssey.

[8]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Xiao Li,et al.  Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[11]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[12]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  The NIST Year 2010 Speaker Recognition Evaluation Plan 1 I NTRODUCTION , 2022 .

[14]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Driss Matrouf,et al.  A straightforward and efficient implementation of the factor analysis model for speaker verification , 2007, INTERSPEECH.

[16]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[17]  Spyridon Matsoukas,et al.  Domain adaptation via within-class covariance correction in I-vector based speaker recognition systems , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Hagai Aronowitz,et al.  Inter dataset variability compensation for speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Sridha Sridharan,et al.  Improving out-domain PLDA speaker verification using unsupervised inter-dataset variability compensation approach , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[21]  Alan McCree,et al.  Supervised domain adaptation for I-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).