Speech enhancement based on neural predictive hidden Markov model

Abstract In this paper, we describe a new approach to speech enhancement by modeling directly the statistical characteristics of the speech waveform. To represent the nonlinear and nonstationary nature of speech, it is assumed that speech is the output of a neural predictive hidden Markov model (NPHMM). The NPHMM is a nonlinear autoregressive process whose time-varying parameters are controlled by a Markov chain. Given some speech data, the parameter of NPHMM is estimated by a learning algorithm based on the combination of Baum–Welch algorithm and a neural network learning algorithm using the well known back propagation technique. Given the parameters of NPHMM, a recursive estimation method using multiple Kalman filters, governed by a Markov state chain according to the transition probabilities is developed for enhancing speech signals degraded by statistically independent additive noise characteristics assumed to be white and Gaussian. Under various input signal-to-noise ratios (SNRs), the proposed recursive speech enhancement method achieves an improvement over the method based on hidden filter model (Lee and Shirai, 1996) of about 0.8–1.2 dB in terms of the measured output SNR.

[1]  C.E. Mokbel,et al.  Automatic word recognition in cars , 1995, IEEE Trans. Speech Audio Process..

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[4]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[5]  Lizhong Wu,et al.  On the design of nonlinear speech predictors with recurrent nets , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[7]  Lizhong Wu,et al.  Fully vector-quantized neural network-based code-excited nonlinear predictive speech coding , 1994, IEEE Trans. Speech Audio Process..

[8]  B. Anderson,et al.  Optimal Filtering , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Ki Yong Lee,et al.  Efficient recursive estimation for speech enhancement in colored noise , 1996, IEEE Signal Processing Letters.

[10]  Yariv Ephraim,et al.  Speech enhancement using state dependent dynamical system model , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Yoshihiro Takada,et al.  Neural Predictive Hidden Markov Model for Speech Recognition , 1995, IEICE Trans. Inf. Syst..

[12]  Yong Lee Ki,et al.  Recursive Estimation for Speech Enhancement using the Hidden Filter Model , 1995 .

[13]  Hamid Sheikhzadeh,et al.  Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization , 1994, IEEE Trans. Speech Audio Process..

[14]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[15]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Esther Levin Hidden control neural architecture modeling of nonlinear time varying systems and its applications , 1993, IEEE Trans. Neural Networks.

[17]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..