Recognition of noisy speech by a nonstationary AR HMM with gain adaptation under unknown noise

In this paper, a gain-adapted speech recognition method in unknown noise is developed in the time domain. Noise is assumed to be colored. To cope with the notable nonstationary nature of speech signals such as fricative, glides, liquids, and transition region between phones, the nonstationary autoregressive (NAR) hidden Markov model (HMM) is used for clean speech. The nonstationary AR process is represented by using polynomial functions with a linear combination of M known basis functions. When only noisy signals are available, the estimation problem of unknown noise inevitably arises. By using multiple Kalman filters, the estimation of noise model and gain contour of speech is performed.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  K. Y. Lee,et al.  On the applications of the interacting multiple model algorithm for enhancing noisy speech , 2000, IEEE Trans. Speech Audio Process..

[3]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[4]  Ki Yong Lee,et al.  A nonstationary autoregressive HMM with gain adaptation for speech recognition , 1998, ICSLP.

[5]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .

[6]  Yariv Ephraim Gain-adapted hidden Markov models for recognition of clean and noisy speech , 1992, IEEE Trans. Signal Process..

[7]  J. S. Lee,et al.  Overview of the technical basis of Qualcomm's CDMA cellular telephone system design: a view of North American TIA/EIA IS-95 , 1994, Proceedings of ICCS '94.

[8]  Li Deng,et al.  A stochastic model of speech incorporating hierarchical nonstationarity , 1993, IEEE Trans. Speech Audio Process..

[9]  S. D. Gray,et al.  Filtering of colored noise for speech enhancement and coding , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Li Deng,et al.  A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition: Fundamentals and Applications , 1995 .

[13]  Wen-Rong Wu,et al.  Subband Kalman filtering for speech enhancement , 1998 .

[14]  Yves Grenier,et al.  Time-dependent ARMA modeling of nonstationary signals , 1983 .

[15]  Yuval Bistritz,et al.  Enhancement of connected words in an extremely noisy environment , 1997, IEEE Trans. Speech Audio Process..

[16]  JaeYeol Rheem,et al.  A nonstationary autoregressive HMM and its application to speech enhancement , 1997, EUROSPEECH.

[17]  Li Deng,et al.  A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition , 1995, Comput. Speech Lang..

[18]  Hamid Sheikhzadeh,et al.  Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization , 1994, IEEE Trans. Speech Audio Process..

[19]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[20]  Mark J. F. Gales,et al.  PMC for speech recognition in additive and convolutional noise , 1993 .