Combined Speech Recognition and Speaker Verification over the Fixed and Mobile Telephone Networks

A double-digit text-dependent speaker verification and text validation system is presented for use in telephone services. The system utilizes concatenated phoneme HMMs for both speech recognition and user authentication, and works in a soundprompted mode. Tests with Hidden Markov Models (HMMs) using Perceptual Linear Prediction (PLP) and Mel Frequency Cepstral Coefficients (MFCC) as well as Cepstral Mean Subtraction (CMS) are performed to assess their effect on recognition performance. The paper also studies the effects of various factors such as the length of the training data, the number of embedded re-estimations and Gaussian mixtures in training of the HMMs, the use of world models, bootstrapping, and user-depended thresholds on the performance of speech recognition and speaker verification.

[1]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[4]  Aaron E. Rosenberg,et al.  Speaker background models for connected digit password speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Raj Nanavati,et al.  Biometrics: Identity Verification in a Networked World , 2002 .

[6]  Joseph P. Campbell Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Johan Lindberg,et al.  Text-prompted versus sound-prompted passwords in speaker verification systems , 1997, EUROSPEECH.

[8]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.