Speech enhancement based on minimum mean-square error estimation and supergaussian priors

This paper presents a class of minimum mean-square error (MMSE) estimators for enhancing short-time spectral coefficients of a noisy speech signal. In contrast to most of the presently used methods, we do not assume that the spectral coefficients of the noise or of the clean speech signal obey a (complex) Gaussian probability density. We derive analytical solutions to the problem of estimating discrete Fourier transform (DFT) coefficients in the MMSE sense when the prior probability density function of the clean speech DFT coefficients can be modeled by a complex Laplace or by a complex bilateral Gamma density. The probability density function of the noise DFT coefficients may be modeled either by a complex Gaussian or by a complex Laplacian density. Compared to algorithms based on the Gaussian assumption, such as the Wiener filter or the Ephraim and Malah (1984) MMSE short-time spectral amplitude estimator, the estimators based on these supergaussian densities deliver an improved signal-to-noise ratio.

[1]  Steven F. Boll,et al.  Optimal estimators for spectral restoration of noisy speech , 1984, ICASSP.

[2]  Jin Yang Frequency domain noise suppression approaches in mobile telephone systems , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[4]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[5]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[6]  Rainer Martin,et al.  SPEECH ENHANCEMENT IN THE DFT DOMAIN USING LAPLACIAN SPEECH PRIORS , 2003 .

[7]  David Middleton,et al.  Simultaneous optimum detection and estimation of signals in noise , 1968, IEEE Trans. Inf. Theory.

[8]  Keith E. Muller,et al.  Computing the confluent hypergeometric function, M(a,b,x) , 2001, Numerische Mathematik.

[9]  Rainer Martin,et al.  Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Rainer Martin,et al.  MMSE estimation of magnitude-squared DFT coefficients with superGaussian priors , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  Mark Nardin,et al.  Numerical evaluation of the confluent hypergeometric function for complex arguments of large magnitudes , 1992 .

[13]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[14]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[15]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[16]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[17]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[18]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[19]  Mark Nardin,et al.  Algorithm 707: CONHYP: a numerical evaluator of the confluent hypergeometric function for complex arguments of large magnitudes , 1992, TOMS.

[20]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[21]  R. Martin,et al.  New speech enhancement techniques for low bit rate speech coding , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[22]  T. Lotter NOISE REDUCTION BY MAXIMUM A POSTERIORI SPECTRAL AMPLITUDE ESTIMATION WITH SUPERGAUSSIAN SPEECH MODELING , 2003 .