Creating speaker-specific phonetic templates with a speaker-independent phonetic recognizer: implications for voice dialing

We present a new approach to speaker dependent template generation which uses dramatically less storage to represent a speaker's words, with minimal degradation in recognition accuracy. In this approach, the symbolic string produced by a speaker-independent phonetic recognizer is used to represent utterances. We investigate effective procedures for template generation, and compare the results of these procedures to templates represented by acoustic parameters for utterances produced with different telephone handsets. The use of speaker-specific templates led to a reduction of about 1:500 in data-storage requirements with comparable recognition accuracy. We also compare recognition performance for speaker-specific and speaker-independent templates, and for combinations of the two. The results showed that combining speaker-specific and speaker-independent templates produces better recognition performance than either alone. A voice dialing system is described which incorporates the speaker-specific templates.