Local training for PLDA in speaker verification

PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification. However, PLDA training requires a large amount of labeled development data, which is highly expensive in most cases. A possible approach to mitigate the problem is various unsupervised adaptation methods, which use unlabeled data to adapt the PLDA scattering matrices to the target domain. In this paper, we present a new ‘local training’ approach that utilizes inaccurate but much cheaper local labels to train the PLDA model. These local labels discriminate speakers within a single conversion only, and so are much easier to obtain compared to the normal ‘global labels’. Our experiments show that the proposed approach can deliver significant performance improvement, particularly with limited globally-labeled data.

[1]  Sridha Sridharan,et al.  Dataset-invariant covariance normalization for out-domain PLDA speaker verification , 2015, INTERSPEECH.

[2]  Hitoshi Yamamoto,et al.  Domain adaptation using maximum likelihood linear transformation for PLDA-based speaker verification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[5]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[6]  Vincent M. Stanford,et al.  The 2021 NIST Speaker Recognition Evaluation , 2022, Odyssey.

[7]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  John H. L. Hansen,et al.  Utilization of unlabeled development data for speaker verification , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[9]  Yonghong Yan,et al.  Sparse Probabilistic Linear Discriminant Analysis for Speaker Verification , 2012, INTERSPEECH.

[10]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[11]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Alan McCree,et al.  Improving speaker recognition performance in the domain adaptation challenge using deep neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[13]  Eduardo Lleida,et al.  Unsupervised adaptation of PLDA by using variational Bayes methods , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).