A Text-Independent Speaker Verification System Based on Cross Entropy

This paper presents a method based on information theory to estimate the distortion between the enrolled speaker’s model and the test utterance in speaker verification system. It uses the cross entropy (CE) to compute the distance between two parametric models (such as GMMs). Different from the traditional average log-likelihood method, it considers the symmetry between the test utterance and the referenced model. In the verification phase, the zt-norm is used to compensate the session variability. Experiment results based on the TIMIT database show that the proposed method can efficiently reduce error rates over the standard log-likelihood scoring.

[1]  Hagai Aronowitz,et al.  Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation , 2004, MLMI.

[2]  Hagai Aronowitz,et al.  Efficient Speaker Recognition Using Approximated Cross Entropy (ACE) , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[4]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[5]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[6]  Wei-Ho Tsai,et al.  Explicit exploitation of stochastic characteristics of test utterance for text-independent speaker identification , 2001, INTERSPEECH.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[9]  Herbert Gish,et al.  Covariance estimation methods for channel robust text-independent speaker identification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Peder A. Olsen,et al.  An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models , 2003, INTERSPEECH.

[11]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..