Channel and noise adaptation via HMM mixture mean transform and stochastic matching

We present a non-linear model transformation for adapting Gaussian mixture HMMs using both static and dynamic MFCC observation vectors to additive noise and constant system tilt. This transformation depends upon a few compensation coefficients which can be estimated from channel distorted speech via maximum-likelihood stochastic matching. Experimental results validate the effectiveness of the adaptation. We also provide an adaptation strategy which can result in improved performance at reduced computational cost compared with a straightforward implementation of stochastic matching.

[1]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[2]  Saeed Vaseghi,et al.  Noise-adaptive hidden Markov models based on wiener filters , 1993, EUROSPEECH.

[3]  Bertram E. Shi,et al.  A non-linear model transformation for ML stochastic matching in additive noise , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[4]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[6]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[7]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[8]  Saeed Vaseghi,et al.  Noisy speech recognition based on HMMs, Wiener filters and re-evaluation of most likely candidates , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[10]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .