An HMM approach to text-prompted speaker verification

This paper presents a speaker recognition system based on hidden Markov models (HMM). The system utilizes concatenated phoneme HMMs and works in a text-prompted mode. Each registered speaker has a separate set of HMMs which are trained using the Baum-Welch algorithm. The speaker recognition system has been evaluated with the YOHO voice verification corpus in terms of both speaker verification and closed-set speaker identification. It is shown that by using 10 seconds of testing speech, an error rate of 0.09% for male and 0.29% for female are obtained for speaker identification with a total population of 138 talkers. For speaker verification, under the 0% false rejection condition, the system achieves a false acceptance rate of 0.09% for male and 0% for female. This paper also studies effects of various factors (such as the mixture number and cohort selection) on the performance of speaker recognition.

[1]  Richard J. Mammone,et al.  A subword neural tree network approach to text-dependent speaker verification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Joseph P. Campbell Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Sadaoki Furui,et al.  An Overview of Speaker Recognition Technology , 1996 .

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Aaron E. Rosenberg,et al.  Sub-word unit talker verification using hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Douglas A. Reynolds,et al.  Text-dependent speaker verification using decoupled and integrated speaker and speech recognizers , 1995, EUROSPEECH.