Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions

In a traditional HMM compensation approach to robust speech recognition that uses Vector Taylor Series (VTS) approximation of an explicit model of environmental distortions, the set of generic HMMs are typically trained from “clean” speech only. In this paper, we present a maximum likelihood approach to training generic HMMs from both “clean” and “corrupted” speech based on the concept of irrelevant variability normalization. Evaluation results on Aurora2 connected digits database demonstrate that the proposed approach achieves significant improvements in recognition accuracy compared to the traditional VTS-based HMM compensation approach.