A general joint additive and convolutive bias compensation approach applied to noisy Lombard speech recognition

A unified approach to the acoustic mismatch problem is proposed. A maximum likelihood state-based additive bias compensation algorithm is developed for the continuous density hidden Markov model (CDHMM). Based on this technique, specific bias models in the mel cepstral and the linear spectral domains are presented. Among these models, a new polynomial trend bias model in the mel cepstral domain is derived, which proved effective for Lombard speech compensation. In addition, a joint estimation algorithm for additive and convolutive bias compensation is proposed. This algorithm is based on applying the expectation maximization (EM) technique in both above-mentioned domains, in conjunction with a parallel model combination (PMC) based transformation. The compensation of the dynamic (difference) coefficients in the proposed framework is also studied. The evaluation data base consists of a 21 confusable word vocabulary uttered by 24 speakers. Three mismatched versions of the data base are considered, i.e., Lombard speech, 15 dB noisy Lombard speech, and 5 dB noisy Lombard speech. The proposed techniques result in 50.9%, 74.6%, and 67.3% reduction in the performance difference between matched and uncompensated word error rates for the three mismatch conditions, respectively. When dynamic coefficients are considered the corresponding reductions are 46.8%, 72.4%, and 70.9%.

[1]  Sadaoki Furui,et al.  A maximum likelihood procedure for a universal adaptation method based on HMM composition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  C.E. Mokbel,et al.  Automatic word recognition in cars , 1995, IEEE Trans. Speech Audio Process..

[3]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[4]  Yariv Ephraim Gain-adapted hidden Markov models for recognition of clean and noisy speech , 1992, IEEE Trans. Signal Process..

[5]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[6]  Philip C. Woodland,et al.  Speaker adaptation of continuous density HMMs using multivariate linear regression , 1994, ICSLP.

[7]  Chin-Hui Lee,et al.  Robust speech recognition based on stochastic matching , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Mark J. F. Gales,et al.  A fast and flexible implementation of parallel model combination , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[10]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[11]  Yifan Gong,et al.  A unified maximum likelihood approach to acoustic mismatch compensation: application to noisy Lombard speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Jenq-Neng Hwang,et al.  Noisy speech recognition using robust inversion of hidden Markov models , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..

[16]  John H. L. Hansen,et al.  Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress , 1995, IEEE Trans. Speech Audio Process..

[17]  Mitch Weintraub,et al.  Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus , 1994, IEEE Trans. Speech Audio Process..

[18]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[19]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[20]  Jean-Claude Junqua,et al.  Speech discrimination in adverse conditions using acoustic knowledge and selectively trained neural networks , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[22]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[23]  Mark J. F. Gales,et al.  Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..

[24]  John H. L. Hansen,et al.  Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect , 1994, IEEE Trans. Speech Audio Process..

[25]  John H. L. Hansen,et al.  Lombard effect compensation for robust automatic speech recognition in noise , 1990, ICSLP.

[26]  Yifan Gong,et al.  Noise independent speech recognition for a variety of noise types , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[28]  Yunxin Zhao,et al.  An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition , 1994, IEEE Trans. Speech Audio Process..

[29]  Vassilios Digalakis,et al.  Speaker adaptation using combined transformation and Bayesian methods , 1996, IEEE Trans. Speech Audio Process..

[30]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..