Baseform adaptation for large vocabulary hidden Markov model based speech recognition systems

A method for adaptation of the IBM speech recognition system in the situation where the system is already trained for the new speaker and one tries to further adapt and improve the system while it is actually being used by the new speaker in the recognition mode is described. A special kind of adaptation is investigated where the emphasis is not on the adaptation of the statistical parameters of the Markov models but on the adaptation of the structure of these models. This structure is defined by the baseforms describing the composition of word models from phone models in the system. Therefore, baseform adaptation corresponds directly to the adaptation of the new system to the personal speaker characteristics of the new user. Several different baseform adaptation schemes are investigated and it is demonstrated that for a speaker who has already trained the system and achieves a 95.2% recognition performance, the performance can be further improved to 96.3%.<<ETX>>

[1]  G. Rigoll Speaker adaptation for large vocabulary speech recognition systems using speaker Markov models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  Lalit R. Bahl,et al.  Continuous speech recognition with automatically selected acoustic prototypes obtained by either bootstrapping or clustering , 1981, ICASSP.

[3]  Gerhard Rigoll,et al.  An information theory approach to speaker adaptation , 1989, EUROSPEECH.

[4]  Michael Picheny,et al.  Acoustic Markov models used in the Tangora speech recognition system , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[5]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  R. Schwartz,et al.  Rapid speaker adaptation using a probabilistic spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.