Multiple stream model-based feature enhancement for noise robust speech recognition

In this paper, we motivate the introduction of multiple feature streams to cover the gap between the noise-free and the estimated features in the context of Model-Based Feature Enhancement (MBFE) for noise robust speech recognition. Especially at low local SNR-levels the global MMSE-estimate might not be optimal and its uncertainty is large. Therefore, it is first shown how a constrained quadratic optimisation problem can improve the linear combination weights in the MMSE-formula. Alternatively, these weights are then approximated by K Kronecker deltas. Both approaches are compared by recognition experiments on the Aurora2 task. Also, Multiple Stream MBFE is validated on the large vocabulary Aurora4 benchmark task. On the latter, a decrease in average Word Error Rate could be obtained from 37.73% (no enhancement) to 26.13% (single stream MBFE) and finally, to 24.89% (multiple stream MBFE).