A challenging, unsolved problem in the speech recognition community is recognizing speech signals that are corrupted by loud, highly nonstationary noise. One approach to noisy speech recognition is to automatically remove the noise from the cepstrum sequence before feeding it in to a clean speech recognizer. In previous work published in Eurospeech, we showed how a probability model trained on clean speech and a separate probability model trained on noise could be combined for the purpose of estimating the noisefree speech from the noisy speech. We showed how an iterative 2nd order vector Taylor series approximation could be used for probabilistic inference in this model. In many circumstances, it is not possible to obtain examples of noise without speech. Noise statistics may change signi cantly during an utterance, so that speechfree frames are not suAEcient for estimating the noise model. In this paper, we show how the noise model can be learned even when the data contains speech. In particular, the noise model can be learned from the test utterance and then used to denoise the test utterance. The approximate inference technique is used as an approximate E step in a generalized EM algorithm that learns the parameters of the noise model from a test utterance. For both Wall Street Journal data with added noise samples and the Aurora benchmark, we show that the new noise adaptive technique performs as well as or signi cantly better than the non-adaptive algorithm, without the need for a separate training set of noise examples.
[1]
Mark J. F. Gales,et al.
Robust continuous speech recognition using parallel model combination
,
1996,
IEEE Trans. Speech Audio Process..
[2]
Pedro J. Moreno,et al.
Speech recognition in noisy environments
,
1996
.
[3]
Roger K. Moore,et al.
Hidden Markov model decomposition of speech and noise
,
1990,
International Conference on Acoustics, Speech, and Signal Processing.
[4]
Geoffrey E. Hinton,et al.
A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants
,
1998,
Learning in Graphical Models.
[5]
Brendan J. Frey,et al.
ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition
,
2001,
INTERSPEECH.
[6]
Li Deng,et al.
Large-vocabulary speech recognition under adverse acoustic environments
,
2000,
INTERSPEECH.
[7]
S. Boll,et al.
Suppression of acoustic noise in speech using spectral subtraction
,
1979
.
[8]
Michael I. Jordan,et al.
An Introduction to Variational Methods for Graphical Models
,
1999,
Machine-mediated learning.
[9]
Li Deng,et al.
Speech Denoising and Dereverberation Using Probabilistic Models
,
2000,
NIPS.