Towards non-stationary model-based noise adaptation for large vocabulary speech recognition

Recognition rates of speech recognition systems are known to degrade substantially when there is a mismatch between training and deployment environments. One approach to tackling this problem is to transform the acoustic models based on the channel distortion and noise characteristics of the new environment. Currently, most model adaptation strategies assume that the noise characteristics are stationary. We present results for using multiple noise distributions for the Whisper large vocabulary speech recognition system. The vector Taylor series method for adaptation of the distributions is used, and either a weighted average of the noise states or the locally best noise states is used. Our results indicate that for certain types of noise, significant gains in recognition accuracy can be achieved.