Unsupervised online adaptation for speaker verification over the telephone

This paper presents experiments of unsupervised adaptation for a speaker detection system. The system used is a standard speaker verification system based on cepstral features and Gaus-sian mixture models. Experiments were performed on cellular speech data taken from the NIST 2002 speaker detection evaluation. There was a total of about 30.000 trials involving 330 target speakers and more than 90% of impostor trials. Unsu-pervised adaptation significantly increases the system accuracy, with a reduction of the minimal detection cost function (DCF) from 0.33 for the baseline system to 0.25 with unsupervised on-line adaptation. Two incremental adaptation modes were tested, either by using a fixed decision threshold for adaptation, or by using the a posteriori probability of the true target for weight-ing the adaptation. Both methods provide similar results in the best configurations, but the latter is less sensitive to the actual threshold value.

[1]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[2]  Jean-Luc Gauvain,et al.  Speaker verification over the telephone , 2000, Speech Commun..

[3]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[4]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6]  Larry P. Heck,et al.  An adaptive speaker verification system with speaker dependent a priori decision thresholds , 2002, INTERSPEECH.

[7]  Chafic Mokbel,et al.  Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[9]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..