ASR dependent techniques for speaker identification

Traditional text independent speaker recognition systems are based on Gaussian Mixture Models (GMMs) trained globally over all speech from a given speaker. In this paper, we describe alternative methods for performing speaker identification that utilize domain dependent automatic speech recognition (ASR) to provide a phonetic segmentation of the test utterance. When evaluated on YOHO, several of these approaches were able outperform previously published results on the speaker ID task. On a more difficult conversational speech task, we were able to use a combination of classifiers to reduce identification error rates on single test utterances. Over multiple utterances, the ASR dependent approaches performed significantly better than the ASR independent methods. Using an approach we call speaker adaptive modeling for speaker identification, we were able to reduce speaker identification error rates by 39% over a baseline GMM approach when observing five test utterances from a speaker.

[1]  G. Ruske,et al.  Improving Speaker Recognition Performance Using Phonetically Structured Gaussian Mixture Models , 2001 .

[2]  J.P. Eatock,et al.  A quantitative assessment of the relative speaker discriminating properties of phonemes , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Joseph P. Campbell,et al.  Phonetic speaker recognition , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[4]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[5]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[6]  Victor Zue,et al.  A Segment-Based Speaker Verification System Using SUMMIT , 2008 .

[7]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[8]  Stephanie Seneff,et al.  Dialogue Management in the Mercury Flight Reservation System , 2000 .

[9]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Larry Gillick,et al.  Speaker Recognition on Single- and Multispeaker Data , 2000, Digit. Signal Process..

[11]  Stéphane H. Maes,et al.  Transformation enhanced multi-grained modeling for text-independent speaker recognition , 2000, INTERSPEECH.

[12]  Joseph P. Campbell Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.