On expectation maximization based channel and noise estimation beyond the vector Taylor series expansion

In this work, we show how expectation maximization based simultaneous channel and noise estimation can be derived without a vector Taylor series expansion. The central idea is to approximate the distribution of all the random variables involved — that is noisy speech, clean speech, channel and noise — as one large, joint Gaussian distribution. Consequently, instantaneous estimates of the noise and channel distribution parameters can be obtained by conditioning the joint distribution on observed, noisy speech spectra. This approach allows for the combination of expectation maximization based channel and noise estimation with the unscented transform.

[1]  Masami Akamine,et al.  Bayesian feature enhancement using a mixture of unscented transformation for uncertainty decoding of noisy speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[3]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[4]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[7]  Saeed Vaseghi,et al.  Speech recognition in noisy environments , 1992, ICSLP.

[8]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[9]  Chong Kwan Un,et al.  Speech recognition in noisy environments using first-order vector Taylor series , 1998, Speech Commun..

[10]  Yu Hu,et al.  An HMM Compensation Approach Using Unscented Transformation for Noisy Speech Recognition , 2006, ISCSLP.

[11]  Friedrich Faubel,et al.  A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain , 2008, INTERSPEECH.

[12]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[13]  Friedrich Faubel,et al.  An adaptive level of detail approach to nonlinear estimation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[15]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[16]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  John McDonough,et al.  Distant Speech Recognition , 2009 .