Towards a Fault-Tolerant Speaker Verification System: A Regularization Approach to Reduce the Condition Number

Large-scale deployment of speech interaction devices makes it possible to harvest tremendous data quickly, which also introduces the problem of wrong labeling during data mining. Mislabeled training data has a substantial negative effect on the performance of speaker verification system. This study aims to enhance the generalization ability and robustness of the model when the training data is contaminated by wrong labels. Several regularization approaches are proposed to reduce the condition number of the speaker verification problem, making the model less sensitive to errors in the inputs. They are validated on both NIST SRE corpus and far-field smart speaker data. The results suggest that the performance deterioration caused by mislabeled training data can be significantly ameliorated by proper regularization.

[1]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[2]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Quan Wang,et al.  Attention-Based Models for Text-Dependent Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Sanjeev Khudanpur,et al.  Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.

[5]  Yun Lei,et al.  Autoencoder-Based Semi-Supervised Curriculum Learning for Out-of-Domain Speaker Verification , 2019, INTERSPEECH.

[6]  Douglas A. Reynolds,et al.  The 2018 NIST Speaker Recognition Evaluation , 2019, INTERSPEECH.

[7]  Quan Wang,et al.  Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  David A. Belsley,et al.  Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1991 .

[9]  Vincent M. Stanford,et al.  The 2021 NIST Speaker Recognition Evaluation , 2022, Odyssey.

[10]  Hagai Aronowitz,et al.  Inter dataset variability compensation for speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[12]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.