Speech Enhancement Using Gaussian Scale Mixture Models

This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the log-spectra. The speech model in the log-spectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zero-mean Gaussian whose covariance equals to the exponential of the log-spectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the log-spectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and log-spectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectation-maximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signal-to-noise ratio (SNR) and those reconstructed from the estimated log-spectra produced lower word recognition error rate because the log-spectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress.

[1]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[4]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[5]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[6]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[9]  Yariv Ephraim Gain-adapted hidden Markov models for recognition of clean and noisy speech , 1992, IEEE Trans. Signal Process..

[10]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Ross D. Shachter,et al.  Laplace's Method Approximations for Probabilistic Inference in Belief Networks with Continuous Variables , 1994, UAI.

[13]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[14]  Michael S. Brandstein On the use of explicit speech modeling in microphone array applications , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[16]  A. Czyzewski,et al.  Noise reduction in audio signals based on the perceptual coding approach , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[17]  Simon J. Godsill,et al.  Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Li Deng,et al.  Speech Denoising and Dereverberation Using Probabilistic Models , 2000, NIPS.

[19]  Ho-Young Jung,et al.  Speech Coding And Noise Reduction Using Ica-Based Speech Features , 2000 .

[20]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[21]  Li Deng,et al.  A new method for speech denoising and robust speech recognition using probabilistic models for clean speech and for noise , 2001, INTERSPEECH.

[22]  Brendan J. Frey,et al.  Learning Dynamic Noise Models from Noisy Speech for Robust Speech Recognition , 2001 .

[23]  Christophe Beaugeant,et al.  SPEECH ENHANCEMENT USING A MINIMUM LEAST SQUARE AMPLITUDE ESTIMATOR , 2001 .

[24]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[25]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[26]  Sharon Gannot,et al.  Speech enhancement using a mixture-maximum model , 1999, IEEE Trans. Speech Audio Process..

[27]  T. Lotter NOISE REDUCTION BY MAXIMUM A POSTERIORI SPECTRAL AMPLITUDE ESTIMATION WITH SUPERGAUSSIAN SPEECH MODELING , 2003 .

[28]  R. Balan,et al.  Independent component analysis based single channel speech enhancement , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[29]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[30]  T. Kristjansson,et al.  High resolution signal reconstruction , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[31]  Peter J. W. Rayner,et al.  Single channel nonstationary stochastic signal separation using linear time-varying filters , 2003, IEEE Trans. Signal Process..

[32]  Israel Cohen,et al.  An Integrated Real-Time Beamforming and Postfiltering System for Nonstationary Noise Environments , 2003, EURASIP J. Adv. Signal Process..

[33]  Rainer Martin,et al.  MMSE estimation of magnitude-squared DFT coefficients with superGaussian priors , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[34]  S. Godsill,et al.  Bayesian variable selection and regularization for time–frequency surface estimation , 2004 .

[35]  Wai-Kai Chen,et al.  The Electrical Engineering Handbook , 2004 .

[36]  Yariv Ephraim,et al.  Recent Advancements in Speech Enhancement , 2004 .

[37]  Jacob Benesty,et al.  Study of the Wiener Filter for Noise Reduction , 2005 .

[38]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[39]  Simon J. Godsill,et al.  A Bayesian Approach for Blind Separation of Sparse Sources , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Emmanuel Vincent,et al.  Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .