Joint removal of additive and convolutional noise with model-based feature enhancement

In this paper we describe how we successfully extended the model-based feature enhancement (MBFE) algorithm to jointly remove additive and convolutional noise from corrupted speech. Although a model of the clean speech can incorporate prior knowledge into the feature enhancement process, this model no longer yields an accurate fit if a different microphone is used. To cure the resulting performance degradation, we merge a new iterative EM algorithm to estimate the channel, and the MBFE-algorithm to remove nonstationary additive noise. In the latter, the parameters of a shifted clean speech HMM and a noise HMM are first combined by a vector Taylor series approximation and then the state-conditional MMSE-estimates of the clean speech are calculated. Recognition experiments confirmed the superior performance on the Aurora4 recognition task. An average relative reduction in WER of 12% and 2.8% on the clean and multi condition training respectively, was obtained compared to the Advanced Front-End standard.