Limited parameter hidden Markov models for connected digit speaker verification over telephone channels

The authors describe a speaker verification system for telephone channels based on randomly prompted digit strings and using concatenated context-dependent phonemic hidden Markov models (HMMs). The main goal of this work was to achieve acceptable speaker verification performance while keeping the number of parameters (and, consequently, the amount of training material) as well as the CPU requirements relatively small. To optimize the performance of this system, several features have been used: context-dependent phoneme models; silence and garbage (click) models to take extraneous parts out of the actual utterance; better decision logic, based on associated speakers; better feature vectors using RASTA processing; and rejection of garbage utterances without significantly affecting the overall verification performance. It is shown how these features together led to an average equal error rate of 6.3% on realistic and difficult tasks.<<ETX>>

[1]  Herbert Gish,et al.  Methods and experiments for text-independent speaker recognition over telephone channels , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Sadaoki Furui,et al.  Speaker recognition using concatenated phoneme models , 1992, ICSLP.

[3]  Hynek Hermansky,et al.  Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP) , 1991, EUROSPEECH.

[4]  Aaron E. Rosenberg,et al.  Connected word talker verification using whole word hidden Markov models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[6]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.