Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models

In this paper, we develop and evaluate speech enhancement algorithms, which are based on supergaussian generalized autoregressive conditional heteroscedasticity (GARCH) models in the short-time Fourier transform (STFT) domain. We consider three different statistical models, two fidelity criteria, and two approaches for the estimation of the variances of the STFT coefficients. The statistical model is either Gaussian, Gamma or Laplacian; the fidelity criteria include minimum mean-squared error (MMSE) of the STFT coefficients and MMSE of the log-spectral amplitude (LSA); the spectral variance is estimated based on either the proposed GARCH models or the decision-directed method of Ephraim and Malah. We show that estimating the variance by the GARCH modeling method yields lower log-spectral distortion and higher perceptual evaluation of speech quality scores (PESQ, ITU-T P.862) than by using the decision-directed method, whether the presumed statistical model is Gaussian, Gamma or Laplacian, and whether the fidelity criterion is MMSE of the STFT coefficients or MMSE of the LSA, furthermore while a gaussian model is inferior to the supergaussian models when USING the decision-directed method, the Gaussian model is superior when using the garch modeling method.

[1]  Israel Cohen Modeling speech signals in the time-frequency domain using GARCH , 2004, Signal Process..

[2]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[3]  Simon J. Godsill,et al.  Efficient Alternatives to the Ephraim and Malah Suppression Rule for Audio Signal Enhancement , 2003, EURASIP J. Adv. Signal Process..

[4]  Robert F. Engle,et al.  ARCH: Selected Readings , 1995 .

[5]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[6]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[7]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[8]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[9]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[10]  Israel Cohen,et al.  Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation , 2005, Speech Commun..

[11]  Richard V. Cox,et al.  A modular approach to speech enhancement with an application to speech coding , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[13]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[14]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[15]  Steven F. Boll,et al.  Optimal estimators for spectral restoration of noisy speech , 1984, ICASSP.

[16]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[17]  Bronwyn H Hall,et al.  Estimation and Inference in Nonlinear Structural Models , 1974 .

[18]  R. Chou,et al.  ARCH modeling in finance: A review of the theory and empirical evidence , 1992 .

[19]  Rainer Martin,et al.  SPEECH ENHANCEMENT IN THE DFT DOMAIN USING LAPLACIAN SPEECH PRIORS , 2003 .

[20]  Rainer Martin,et al.  Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[22]  Rainer Martin,et al.  MMSE estimation of magnitude-squared DFT coefficients with superGaussian priors , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[23]  Peter Vary,et al.  Multichannel speech enhancement using Bayesian spectral amplitude estimation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[24]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[25]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[26]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .