Studies on Model Distance Normalization Approach in Text-independent Speaker Verification

Abstract Model distance normalization (D-Norm) is one of the useful score normalization approaches in automatic speaker verification (ASV) systems. The main advantage of D-Norm lies in that it does not need any additional speech data or external speaker population, as opposed to the other state-of-the-art score normalization approaches. But still, it has some drawbacks, e.g., the Monte-Carlo based Kullback-Leibler distance estimation approach in the conventional D-Norm approach is a time consuming and computation costly task. In this paper, D-Norm was investigated and its principles were explored from a perspective different from the original one. In addition, this paper also proposed a simplified approach to perform D-Norm, which used the upper bound of the KL divergence between two statistical speaker models as the measure of model distance. Experiments on NIST 2006 SRE corpus showed that the simplified approach of D-Norm achieves similar system performance as the conventional one while the computational complexity is greatly reduced.

[1]  A. Rollett,et al.  The Monte Carlo Method , 2004 .

[2]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Patrick Kenny,et al.  New MAP estimators for speaker recognition , 2003, INTERSPEECH.

[4]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[5]  M. Do Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models , 2003, IEEE Signal Processing Letters.

[6]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[8]  Frédéric Bimbot,et al.  A Monte-Carlo method for score normalization in Automatic Speaker Verification using Kullback-Leibler distances , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[12]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  I. M. Soboĺ,et al.  Die Monte-Carlo-Methode , 1971 .

[15]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..