Log-spectral feature reconstruction based on an occlusion model for noise robust speech recognition

This paper addresses the problem of feature compensation in the log-spectral domain for speech recognition in noise by recasting the speech distortion problem as an occlusion one. The usual non-linear mismatch function that represents the speech distortion due to additive noise can be reasonably well approximated by the maximum of the two mixing sources (speech and noise). Using this approximation, we propose to enhance the degraded speech features by means of a novel minimum mean square error (MMSE) estimator. The resulting technique shows clear similarities with soft-mask missing-data (MD) reconstruction, although the experimental results on both Aurora-2 and Aurora-4 databases show the effectiveness of the proposed technique in comparison with MD.