Frame-wise HMM adaptation using state-dependent reverberation estimates

A novel frame-wise model adaptation approach for reverberation-robust distant-talking speech recognition is proposed. It adjusts the means of static cepstral features to capture the statistics of reverberant feature vector sequences obtained from distant-talking speech recordings. The means of the HMMs are adapted during decoding using a state-dependent estimate of the late reverberation determined by joint use of a feature-domain reverberation model and optimum partial state sequences. Since the parameters of the HMMs and the reverberation model can be estimated completely independently, the approach is very flexible with respect to changing acoustic environments. Due to the frame-wise model adaptation, some of the HMM limitations are relieved, and recognition results surpassing that of matched reverberant training are obtained at the cost of a moderately increased decoding complexity.