Continuous probabilistic acoustic map for speaker identification

A continuous probabilistic acoustic map (CPAM) approach to speaker recognition is investigated. In the CPAM formulation, the speech input of a speaker is parameterized as a mixture of tied, universal probability density functions (PDFs) with either a CPAM model alone for text-independent operation or a CPAM-based hidden Markov model (HMM) for text-dependent operation. A continuously spoken digit database of 20 speakers (10 M, 10 F) is used to evaluate the CPAM approach in both identification and verification performance. The CPAM approach is shown to perform better than a vector quantization based approach in text-independent speaker recognition, and as well as the text-dependent, conventional, continuous mixture HMM approach with significant representation efficiency. In particular, the CPAM-based HMM achieves an identification error rate of 1.7% and a verification equal-error rate of 4.0% with a CPAM of 128 PDFs while a conventional, continuous mixtures HMM needs 400 PDFs to achieve corresponding error rates of 1.9% and 4.0% using the same combined cepstral features and three-digit test utterances.<<ETX>>

[1]  Aaron E. Rosenberg,et al.  Connected word talker verification using whole word hidden Markov models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Aaron E. Rosenberg,et al.  Evaluation of a vector quantization talker recognition system in text independent and text dependent modes , 1987 .

[4]  Douglas A. Reynolds,et al.  Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech recognition , 1989 .

[6]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[7]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.