Comparison of discriminative training methods for speaker verification

The maximum likelihood estimation (MLE) and Bayesian maximum a-posteriori (MAP) adaptation methods for Gaussian mixture models (GMM) have proven to be effective and efficient for speaker verification, even though each speaker model is trained using only his own training utterances. Discriminative criteria aim at increasing discriminability by using out-of-class data. In this paper, we consider the speaker verification task using three discriminative training methods to compare performance. Comparisons are discussed for the maximum mutual information (MMI), minimum classification error (MCE) and figure of merit (FOM) criteria. Experiments on the 1996 NIST speaker recognition evaluation data set show that FOM training method outperforms the other two methods for speaker verification in terms of system performance. Meanwhile, logistic regression is investigated and successfully employed as a discriminative score-normalization technique.

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  Aaron E. Rosenberg,et al.  Speaker identification using minimum classification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Wolfgang Macherey,et al.  Comparison of discriminative training criteria , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[6]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Samy Bengio,et al.  A comparative study of adaptation methods for speaker verification , 2002, INTERSPEECH.

[8]  Beiqian Dai,et al.  Improving speaker verification with figure of merit training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.