Speaker verification using frame and utterance level likelihood normalization

We propose a new method, where the likelihood normalization technique is applied at both the frame and utterance levels. In this method based on Gaussian mixture models (GMM), every frame of the test utterance is inputed to the claimed and all background speaker models in parallel. In this procedure, for each frame, likelihoods from all the background models are available, hence they can be used for normalization of the claimed speaker likelihood at every frame. A special kind of likelihood normalization, called weighting models rank, is also proposed. We have evaluated our method using two databases-TIMIT and NTT. Results show that the combination of frame and utterance level likelihood normalization in some cases reduces the equal error rate (EER) more than twice.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  D.A. Reynolds,et al.  Large population speaker identification using clean and telephone speech , 1995, IEEE Signal Processing Letters.

[4]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[5]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[6]  Sadaoki Furui,et al.  Likelihood normalization for speaker verification using a phoneme- and speaker-independent model , 1995, Speech Commun..

[7]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[8]  Seiichi Nakagawa,et al.  Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.