Asymptotically exact noise-corrupted speech likelihoods

Model compensation techniques for noise-robust speech recognition approximate the corrupted speech distribution. This paper introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. Though it is too slow to compensate a speech recognition system, it enables a more fine-grained assessment of compensation techniques, based on the KL divergence of individual components. This makes it possible to evaluate the impact of approximations that compensation schemes make, such as the form of the mismatch function.

[1]  Mark J. F. Gales,et al.  Extended VTS for Noise-Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Satoshi Nakamura,et al.  Minimum mean square error filtering of noisy cepstral coefficients with applications to ASR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Brendan J. Frey,et al.  ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition , 2001, INTERSPEECH.

[4]  M. Gales,et al.  A theoretical bound for noise-robust speech recognition , 2010 .

[5]  John Hoddinott,et al.  Further reading , 1980, IEEE Spectrum.

[6]  Li Deng,et al.  Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise , 2004, IEEE Transactions on Speech and Audio Processing.

[7]  Brendan J. Frey,et al.  Speech recognition in adverse environments: a probabilistic approach , 2002 .

[8]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[9]  B. Frey,et al.  ACCOUNTING FOR UNCERTAINTY IN OBSERVATIONS : A NEW PARADIGM FOR ROBUST AUTOMATIC SPEECH RECOGNITION , 2022 .

[10]  Brendan J. Frey,et al.  Accounting for uncertainity in observations: A new paradigm for Robust Automatic Speech Recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[12]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[13]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[14]  Reinhold Häb-Umbach,et al.  An analytic derivation of a phase-sensitive observation model for noise robust speech recognition , 2009, INTERSPEECH.