Unsupervised Spectral Subtraction for Noise-Robust ASR on Unknown Transmission Channels

This paper addresses several issues of classical spectral subtraction methods with respect to the automatic speech recognition task in noisy environments. The main contributions of this paper are twofold. First, a channel normalization method is proposed to extend spectral subtraction to the case of transmission channels such as cellphones. It equalizes the transmission channel and removes part of the additive noise. Second, a simple, computationally efficient \mbox{2-component} probabilistic model is proposed to discriminate between speech and additive noise at the magnitude spectrogram level. Based on this model, an alternative to classical spectral subtraction is proposed, called ``Unsupervised Spectral Subtraction'' (USS). The main difference is that the proposed approach does not require any parameter tuning. Experimental studies on Aurora 2 show that channel normalization followed by USS compares advantageously to both classical spectral subtraction, and the ETSI standard front-end (Wiener filtering). Compared to the ETSI standard front-end, a 21.3% relative improvement is obtained on 0 to 20 dB noise conditions, for an absolute loss of 0.1 % in clean conditions. The computational cost of the proposed approach is very low, which makes it fit for real-time applications.