Efficient VQ-based MMSE estimation for robust speech recognition

This paper presents a feature compensation technique based on the minimum mean square error (MMSE) estimation for robust speech recognition. Similarly to other MMSE compensation methods based on stereo data, our approach models the differences between clean and noisy feature spaces, and the resulting MMSE estimate of the clean feature vector is obtained as a piece-wise linear transformation of the noisy one. However, unlike other well-known MMSE techniques such as SPLICE or MEMLIN, which model the feature spaces with GMMs, in our proposal each feature space is characterized by a set of cells obtained by means of VQ quantization. This VQ-based approach allows a very efficient implementation of the MMSE estimator. Also, the possible degradation inherent to any VQ process is overcome by a strategy based on considering different subregions inside each cell and a subregion-based mean and variance compensation. The experimental results show that, along with a a very efficient MMSE estimator, our technique achieves even better recognition accuracies than SPLICE and MEMLIN.