Long term on-line speaker adaptation for large vocabulary dictation

Online speaker adaptation is desirable for speech recognition dictation applications, because it offers the possibility to improve the system with the speaker specific data obtained from the user. Since the user will work with such a device over a long period, for a dictation system, the long term adaptation performance is more important than the adaptation speed. In contrast to speaker dependent retraining, the speaker specific speech data does not need to be stored for online speaker adaptation and each adaptation step does not require a large computational effort. We describe our way of performing online Bayesian speaker adaptation using partial traceback. We compare supervised with unsupervised adaptation and speaker adaptation with speaker dependent training using the adaptation material. Compared to the speaker independent startup models, the error rate was divided by two after five hours of supervised adaptation in our experiments. In the long term experiments, supervised online adaptation performed similar to speaker dependent training using the adaptation material.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[3]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[5]  Hermann Ney,et al.  Continuous speech dictation - From theory to practice , 1995, Speech Commun..

[6]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Hermann Ney,et al.  Modeling and search in continuous speech recognition , 1993, EUROSPEECH.

[8]  Vassilios Digalakis,et al.  A comparative study of speaker adaptation techniques , 1995, EUROSPEECH.

[9]  Qiang Huo,et al.  On-line Bayes adaptation of SCHMM parameters for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.