Vector quantization for speaker adaptation

In view of designing a speaker-independent large vocabulary recognition system, we evaluate a vector quantization approach to speaker adaptation. Only one speaker (the reference speaker) pronounces the application vocabulary. He also pronounces a small vocabulary called the adaptation vocabulary. Each new speaker then merely pronounces the adaptation vocabulary. Two adaptation methods are investigated, establishing a correspondence between the codebooks of these two speakers. This allows us to transform the reference utterances of the reference speaker into suitable references for the new speaker. Method I uses a transposed codebook to represent the new speaker during the recognition process whereas Method II uses a codebook which is obtained by clustering on the new speaker's pronunciation of the adaptation vocabulary. Experiments were carried out on a 20-speaker database (10 male, 10 female). The adaptation vocabulary contains 136 words; the application one has 104 words. The mean recognition error rate without adaptation is 22.3% for inter-speaker experiments; after one of the two methods has been implemented the mean recognition error rate is 10.5%. Comparison of performance of the two methods shows that a new speaker's codebook is not necessary to represent the new speaker.