Improving speaker verification with figure of merit training

A novel discriminative training method of Gaussian mixture model for text-independent speaker verification, Figure of Merit (FOM) training, is proposed in this paper. FOM training aims at maximizing the FOM of a ROC curve by adjusting the model parameters, rather than only approximating the underlying distribution of acoustic observations of each speaker that Maximum Likelihood Estimation does. The text-independent speaker verification experiments were conducted on the 1996 NIST Speaker Recognition Evaluation corpus. Compared with standard EM training method, FOM training provides significantly improved performance, e.g. the detection cost function (DCF) was reduced to 0.0286 from 0.0369 and to 0.0537 from 0.0826 in matched and mismatched conditions respectively.

[1]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[2]  Eric I-Chao Chang Improving wordspotting performance with limited training data , 1995 .

[3]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[5]  Aaron E. Rosenberg,et al.  Speaker identification using minimum classification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Richard Lippmann,et al.  High-performance low-complexity wordspotting using neural networks , 1997, IEEE Trans. Signal Process..

[7]  Günther Palm,et al.  A discriminative training algorithm for Gaussian mixture speaker models , 1997, EUROSPEECH.

[8]  Alvin F. Martin,et al.  The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..