In view of designing a speaker-independent large vocabulary recognition system, we evaluate a vector quantization approach to speaker adaptation. Only one speaker (the reference speaker) pronounces the application vocabulary. He also pronounces a small vocabulary called the adaptation vocabulary. Each new speaker then merely pronounces the adaptation vocabulary. Two adaptation methods are investigated, establishing a correspondence between the codebooks of these two speakers. This allows us to transform the reference utterances of the reference speaker into suitable references for the new speaker. Method I uses a transposed codebook to represent the new speaker during the recognition process whereas Method II uses a codebook which is obtained by clustering on the new speaker's pronunciation of the adaptation vocabulary. Experiments were carried out on a 20-speaker database (10 male, 10 female). The adaptation vocabulary contains 136 words; the application one has 104 words. The mean recognition error rate without adaptation is 22.3% for inter-speaker experiments; after one of the two methods has been implemented the mean recognition error rate is 10.5%. Comparison of performance of the two methods shows that a new speaker's codebook is not necessary to represent the new speaker.
[1]
M. Hunt.
Speaker adaptation for word‐based speech recognition systems
,
1981
.
[2]
L. R. Rabiner,et al.
On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition
,
1983,
The Bell System Technical Journal.
[3]
Lawrence R. Rabiner,et al.
Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary
,
1979
.
[4]
Aaron E. Rosenberg,et al.
Evaluation of a vector quantization talker recognition system in text independent and text dependent modes
,
1986,
ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5]
Robert M. Gray,et al.
An Algorithm for Vector Quantizer Design
,
1980,
IEEE Trans. Commun..
[6]
Yves GRENIER.
Speaker adaptation through canonical correlation analysis
,
1980,
ICASSP.
[7]
Kiyohiro Shikano,et al.
Speaker adaptation through vector quantization
,
1986,
ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.