Integrating speaker and speech recognizers: Automatic identity claim capture for speaker verification

This paper presents a novel approach to the integration of a speech and speaker recognizer for the purpose of automatically capturing an identity claim of a user. The approach integrates the speaker recognition score into the search process of the speech recognizer resulting in a best hypothesis that jointly optimizes the probability of the word sequence and the speaker. This facilitates the use of a natural speech-based interface, where the identity claim can be ambiguous and relatively difficult to recognize (e.g., names). This paper presents a theoretical framework for the integration of speech and speaker recognition systems. In addition, experimental results are presented that show a 35% reduction in the NL-error rate of an over-the-telephone speech recognition task, where the testset consists of users from a US city of size 1 million identifying themselves by simply speaking their name.

[1]  Aaron E. Rosenberg,et al.  Speaker identification and verification combined with speaker independent word recognition , 1981, ICASSP.

[2]  Larry P. Heck,et al.  A model-based transformational approach to robust speaker recognition , 2000, INTERSPEECH.

[3]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[5]  D. A. Reynolds,et al.  Integration of speaker and speech recognition systems , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Vassilios Digalakis,et al.  Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..