Clustering-Based Score Normalization for Speaker Verification

Score normalization can improve speaker verification (SV) performance by adjusting the distribution of test scores to follow a normal distribution. In this paper, all of the imposter scores for the target speakers are first obtained from the normalization cohort; then, these scores are clustered by an unsupervised clustering algorithm, and Gaussian mixture models (GMMs) are used to fit the score distribution. The mean and the standard deviation of the Gaussian component with the maximum mean value is used in the SV score normalization method. Experiments are carried out on the NIST SRE 2016 test set and the VOiCES test set. Compared with conventional score normalization methods, the proposed method can effectively improve SV performance.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Douglas E. Sturim,et al.  Speaker verification using text-constrained Gaussian Mixture Models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Hagai Aronowitz,et al.  Modeling intra-speaker variability for speaker recognition , 2005, INTERSPEECH.

[4]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Moshe Wasserblat,et al.  How to Deal with Multiple-Targets in Speaker Identification Systems? , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[6]  Julian Fiérrez,et al.  Speaker verification using speaker- and test-dependent fast score normalization , 2007, Pattern Recognit. Lett..

[7]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[9]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[10]  Ludek Müller,et al.  Comparison of score normalization methods applied to multi-label classification , 2014, 2014 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[11]  Sanjeev Khudanpur,et al.  Deep neural network-based speaker embeddings for end-to-end speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[12]  Dijana Petrovska-Delacrétaz,et al.  Cohort selection for text-dependent speaker verification score normalization , 2016, 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).

[13]  Lukás Burget,et al.  Analysis of Score Normalization in Multilingual Speaker Recognition , 2017, INTERSPEECH.

[14]  Niko Brümmer,et al.  A Generative Model for Score Normalization in Speaker Recognition , 2017, INTERSPEECH.

[15]  T. Zheng,et al.  Research on Score Domain Speaking Rate Normalization for Speaker Recognition , 2017 .

[16]  Douglas A. Reynolds,et al.  The 2018 NIST Speaker Recognition Evaluation , 2019, INTERSPEECH.

[17]  Colleen Richey,et al.  The VOiCES from a Distance Challenge 2019 Evaluation Plan , 2019, ArXiv.

[18]  Bin Gu,et al.  USTCSpeech System for VOiCES from a Distance Challenge 2019 , 2019 .