Speaker adaptation using semi-continuous hidden Markov models

Presents a new approach to speaker adaptation based on semi-continuous hidden Markov models (SCHMM). The authors introduce a modification of the semi-continuous codebook updating which allows rapid speaker adaptation. The approach is based on the idea that phonetic information already incorporated in a trained model should be used to update the codebook. Thus the different acoustic representation of a new speaker is learned while the connection between codebook entries and model states remains the same. Several experiments were carried out with a small speech sample. It is possible to demonstrate that the new codebook updating performs better than conventional SCHMM codebook updating and that using a speech sample comprising about 40 seconds of adaptation speech is enough to achieve 50 percent of the difference in performance between full speaker-dependent training and no adaptation at all.<<ETX>>

[1]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[2]  Stephen Cox,et al.  Unsupervised speaker adaptation by probabilistic spectrum fitting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Kai-Fu Lee,et al.  On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  R. Schwartz,et al.  A new paradigm for speaker-independent training , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[7]  Kiyohiro Shikano,et al.  Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  S. Rieck,et al.  Acoustic modelling of subword units in the Isadora speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Peter Regel-Brietzmann,et al.  Fast speaker adaptation for speech recognition systems , 1990, International Conference on Acoustics, Speech, and Signal Processing.