Improvements in MLLR-Transform-based Speaker Recognition

We previously proposed the use of MLLR transforms derived from a speech recognition system as speaker features in a speaker verification system. In this paper we report recent improvements to this approach. First, we noticed a fundamental problem in our previous implementation that stemmed from a mismatch between male and female recognition models, and the model transforms they produce. Although it affects only a small percentage of verification trials (those in which the gender detector commits errors), this mismatch has a large effect on average system performance. We solve this problem by consistently using only one recognition model (either male or female) in computing speaker adaptation transforms regardless of estimated speaker gender. A further accuracy boost is obtained by combining feature vectors derived from male and female vectors into one larger feature vector. Using 1-conversation-side training, the final system has about 27% lower decision cost than a state-of-the-art ccpstral GMM speaker system, and 53% lower decision cost when trained on 8 conversation sides per speaker

[1]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[2]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[3]  Andreas Stolcke,et al.  The Contribution of Cepstral and Stylistic Features to SRI's 2005 NIST Speaker Recognition Evaluation System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Francis Kubala,et al.  Fast Robust Inverse Transform SAT and Multi-stage Adaptation , 1998 .

[5]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[7]  A. Stolcke,et al.  Combining feature sets with support vector machines: application to speaker recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[8]  Andreas Stolcke,et al.  Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  S.S. Kajarekar Four weightings and a fusion: a cepstral-SVM system for speaker recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[10]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[11]  Andreas Stolcke,et al.  SRI's 2004 NIST speaker recognition evaluation system , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[13]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Alvin F. Martin,et al.  Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004 , 2004, LREC.