Speaker Verification Using Adapted Gaussian Mixture Models

Reynolds, Douglas A., Quatieri, Thomas F., and Dunn, Robert B., Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing10(2000), 19Â?41.In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Douglas A. Reynolds,et al.  Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[6]  Michael J. Carey,et al.  A speaker verification system using alpha-nets , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[8]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[9]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[11]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[12]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[13]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[14]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[15]  D. Reynolds Automatic Speaker Recognition Using Gaussian Mixture Speaker Models , 1995 .

[16]  T.F. Quatieri,et al.  The effects of telephone transmission degradations on speaker recognition performance , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Sadaoki Furui,et al.  Likelihood normalization for speaker verification using a phoneme- and speaker-independent model , 1995, Speech Commun..

[18]  D. A. Reynolds,et al.  The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[19]  Aaron E. Rosenberg,et al.  Speaker background models for connected digit password speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  Don McAllaster,et al.  Speaker verification through large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[21]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Douglas A. Reynolds,et al.  HTIMIT and LLHDB: speech corpora for the study of handset transducer effects , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[24]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[25]  Douglas A. Reynolds,et al.  Magnitude-only estimation of handset nonlinearity with application to speaker recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[26]  Alvin F. Martin,et al.  The 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking , 1999, EUROSPEECH.

[27]  Sarel van Vuuren,et al.  Speaker verification in a time-feature space , 1999 .

[28]  Jun-ichi Takahashi,et al.  Text-independent speaker verification using virtual speaker based cohort normalization , 1999, EUROSPEECH.

[29]  Jean-François Bonastre,et al.  Similarity normalization method based on world model and a posteriori probability for speaker verification , 1999, EUROSPEECH.

[30]  Douglas A. Reynolds,et al.  Approaches to Speaker Detection and Tracking in Conversational Speech , 2000, Digit. Signal Process..

[31]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[32]  Alvin F. Martin,et al.  The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[33]  D. Reynolds,et al.  Pc-based Tms320c30 Implementation of the Gaussian Mixture Model Text-independent Speaker Recognition System , 2022 .