论文信息 - Weakly Supervised PLDA Training

Weakly Supervised PLDA Training

PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification. However, PLDA training requires a large amount of labelled development data, which is highly expensive in most cases. We present a cheap PLDA training approach, which assumes that speakers in the same session can be easily separated, and speakers in different sessions are simply different. This results in `weak labels' which are not fully accurate but cheap, leading to a weak PLDA training. Our experimental results on real-life large-scale telephony customer service achieves demonstrated that the weak training can offer good performance when human-labelled data are limited. More interestingly, the weak training can be employed as a discriminative adaptation approach, which is more efficient than the prevailing unsupervised method when human-labelled data are insufficient.

Dong Wang | Yixiang Chen | Lantian Li | Chenghui Zhao

[1] Patrick Kenny,et al. Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[2] Alan McCree,et al. Improving speaker recognition performance in the domain adaptation challenge using deep neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[3] John H. L. Hansen,et al. Utilization of unlabeled development data for speaker verification , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[4] Sridha Sridharan,et al. Dataset-invariant covariance normalization for out-domain PLDA speaker verification , 2015, INTERSPEECH.

[5] Hitoshi Yamamoto,et al. Domain adaptation using maximum likelihood linear transformation for PLDA-based speaker verification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Vincent M. Stanford,et al. The 2021 NIST Speaker Recognition Evaluation , 2022, Odyssey.

[7] Yun Lei,et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] James H. Elder,et al. Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[10] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Daniel Garcia-Romero,et al. Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[12] Eduardo Lleida,et al. Unsupervised adaptation of PLDA by using variational Bayes methods , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Yonghong Yan,et al. Sparse Probabilistic Linear Discriminant Analysis for Speaker Verification , 2012, INTERSPEECH.