A Generative-Discriminative Framework using Ensemble Methods for Text-Dependent Speaker Verification

Speaker verification can be treated as a statistical hypothesis testing problem. The most commonly used approach is the likelihood ratio test (LRT), which can be shown to be optimal using the Neymann-Pearson lemma. However, in most practical situations the Neymann-Pearson lemma does not apply. In this paper, we present a more robust approach that makes use of a hybrid generative-discriminative framework for text-dependent speaker verification. Our algorithm makes use of a generative models to learn the characteristics of a speaker and then discriminative models to discriminate between a speaker and an impostor. One of the advantages of the proposed algorithm is that it does not require us to retrain the generative model. The proposed model, on an average, yields 36.41% relative improvement in EER over a LRT.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Shai Fine,et al.  A hybrid GMM/SVM approach to speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[4]  Ran D. Zilca Text-independent speaker verification using utterance level scoring and covariance modeling , 2002, IEEE Trans. Speech Audio Process..

[5]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[6]  Joseph P. Campbell,et al.  Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  References , 1971 .

[8]  J. Oglesby,et al.  Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation , 1995 .

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  Harry Shum,et al.  Learning to boost GMM based speaker verification , 2003, INTERSPEECH.

[11]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Herbert Reininger,et al.  A system for text dependent speaker verification - field trial evaluation and simulation results , 2001, INTERSPEECH.

[13]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[14]  Yochai Konig,et al.  DISCRIMINATIVE TRAINING OF MINIMUM COST SPEAKER VERIFICATION SYSTEMS , 1998 .

[15]  Denis Jouvet,et al.  Use of a confidence measure based on frame level likelihood ratios for the rejection of incorrect data , 1999, EUROSPEECH.

[16]  Seiichi Nakagawa,et al.  Speaker verification using frame and utterance level likelihood normalization , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.