论文信息 - Text-constrained speaker recognition on a text-independent task

Text-constrained speaker recognition on a text-independent task

We present an approach to speaker recognition in the textindependent domain of conversational telephone speech using a text-constrained system designed to employ select highfrequency keywords in the speech stream. The system uses speaker word models generated via Hidden Markov Models (HMMs) — a departure from the traditional Gaussian Mixture Model (GMM) approach dominant in text-independent work, but commonly employed in text-dependent systems — with the expectation that HMMs take greater advantage of sequential information and support more detailed modeling which could be used to aid recognition. Even with a keyword inventory that covers a mere 10% of the word tokens and a system that does not yet incorporate many standard speaker recognition normalization schemes, this approach is already achieving equal error rates of 1% on NIST’s 2001 Extended Data task.

Barbara Peskin | Kofi Boakye

[1] Douglas A. Reynolds,et al. Modeling prosodic dynamics for speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2] George R. Doddington,et al. Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[3] Michael J. Carey,et al. Robust prosodic features for speaker identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4] Andreas Stolcke,et al. THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM , 2000 .

[5] Jean-Luc Gauvain,et al. Experiments with speaker verification over the telephone , 1995, EUROSPEECH.

[6] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7] Don McAllaster,et al. Speaker verification through large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8] Douglas A. Reynolds,et al. The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9] Larry Gillick,et al. Speaker Recognition on Single- and Multispeaker Data , 2000, Digit. Signal Process..

[10] Douglas E. Sturim,et al. Speaker verification using text-constrained Gaussian Mixture Models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Alvin F. Martin,et al. The DET curve in assessment of detection task performance , 1997, EUROSPEECH.