Speaker adaptation in the NIST speaker recognition evaluation 2004

New in the 2004 edition of the NIST Speaker Recognition Evaluation (SRE) was the condition where unsupervised adaptation of speaker models is allowed. Despite the promising results on development test material, hardly any beneficial results were obtained in the Evaluation itself. An analysis is made why this was the case, and it appears that a mimimum level of performance is essential to obtain results using adaptation that improve on the performance without adaptation. Further, the system should be well calibrated. For the conditions with 8 conversation sides we have been able to find improvement using unsupervised adaptation using the NIST 2004 evaluation, both for an UBM/GMM adaptation methodology, and a novel SVM adaptation methodology. The minimum DCF for a fused system drops from 0.259 for the unadapted condition to 0.231 for the adapted condition.

[1]  Larry P. Heck,et al.  An adaptive speaker verification system with speaker dependent a priori decision thresholds , 2002, INTERSPEECH.

[2]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  Alvin F. Martin,et al.  NIST speaker recognition evaluation chronicles , 2004, Odyssey.

[5]  Nikki Mirghafori,et al.  Parameterization of the score threshold for a text-dependent adaptive speaker verification system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[7]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[8]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Nikki Mirghafori,et al.  Desperately seeking impostors: data-mining for competitive impostor testing in a text-dependent speaker verification system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.