Decision making based on cohort scores for speaker verification

Decision making is an important component in a speaker verification system. For the conventional GMM-UBM architecture, the decision is usually conducted based on the log likelihood ratio of the test utterance against the GMM of the claimed speaker and the UBM. This single-score decision is simple but tends to be sensitive to the complex variations in speech signals (e.g. text content, channel, speaking style, etc.). In this paper, we propose a decision making approach based on multiple scores derived from a set of cohort GMMs (cohort scores). Importantly, these cohort scores are not simply averaged as in conventional cohort methods; instead, we employ a powerful discriminative model as the decision maker. Experimental results show that the proposed method delivers substantial performance improvement over the baseline system, especially when a deep neural network (DNN) is used as the decision maker, and the DNN input involves some statistical features derived from the cohort scores.

[1]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[2]  Robert I. Damper,et al.  Impostor cohort selection for score normalisation in speaker verification , 1997, Pattern Recognit. Lett..

[3]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[4]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[5]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[6]  Themos Stafylakis,et al.  Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition , 2014, Odyssey.

[7]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[9]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Erik McDermott,et al.  Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[12]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..