Multidimensional STSA Estimators for Speech Enhancement With Correlated Spectral Components

Speech enhancement algorithms are used to remove background noise in a speech signal. In Bayesian short-time spectral amplitude (STSA) estimation for single-channel speech enhancement, the spectral components are traditionally assumed uncorrelated. However, this assumption is inexact since some correlation is present in practice. In this paper, we investigate a multidimensional Bayesian STSA estimator that assumes correlated spectral components. Since the closed-form solution of this optimum estimator is not readily available, we alternatively derive closed-form expressions for an upper and a lower bound on the desired estimator. Using these bounds, we propose a new family of speech enhancement estimators that are characterized by a scalar parameter 0 ≤ γ ≤ 1, with γ = 0 corresponding to the lower bound and γ = 1 to the upper bound. An appropriate estimator for the correlation matrix of the clean speech is further derived. Evaluation results from both objective and subjective speech quality measures show that at moderate to high SNR values, where spectral correlation of speech is most noticeable, the proposed estimators can achieve significant improvements over the traditional STSA and Wiener filter estimators.

[1]  Eric Plourde,et al.  Bayesian short-time spectral amplitude estimators for single-channel speech enhancement , 2009 .

[2]  Christophe Beaugeant,et al.  Overcoming the statistical independence assumption w.r.t. frequency in speech enhancement , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[4]  Douglas D. O'Shaughnessy,et al.  Speech communications - human and machine, 2nd Edition , 2000 .

[5]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[6]  Søren Vang Andersen,et al.  A Block-Based Linear MMSE Noise Reduction with a High Temporal Resolution Modeling of the Speech Excitation , 2005, EURASIP J. Adv. Signal Process..

[7]  L. Rabiner,et al.  An interpretation of the log likelihood ratio as a measure of waveform coder performance , 1980 .

[8]  Eric Plourde,et al.  Generalized Bayesian Estimators of the Spectral Amplitude for Speech Enhancement , 2009, IEEE Signal Processing Letters.

[9]  Tran Huy Dat,et al.  Generalized gamma modeling of speech and its online estimation for speech enhancement , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Anthony S. B. Holland Complex Function Theory , 2021, Bounded Gaps Between Primes.

[11]  Ning Ma,et al.  Speech enhancement using a masking threshold constrained Kalman filter and its heuristic implementations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[13]  Bobby R. Hunt,et al.  Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier , 1993, IEEE Trans. Speech Audio Process..

[14]  P. Dooren Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory [Book Review] , 2006 .

[15]  Rainer Martin,et al.  Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Eric Plourde,et al.  Bayesian spectral amplitude estimation for speech enhancement with correlated spectral components , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[17]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[18]  Thomas Esch,et al.  Wideband noise suppression supported by artificial bandwidth extension techniques , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Joseph Lipka,et al.  A Table of Integrals , 2010 .

[20]  W. Bastiaan Kleijn,et al.  On causal algorithms for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  R. Cooke Real and Complex Analysis , 2011 .

[22]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[23]  Susanto Rahardja,et al.  /spl beta/-order MMSE spectral amplitude estimation for speech enhancement , 2005, IEEE Transactions on Speech and Audio Processing.

[24]  Eric Plourde,et al.  Auditory-Based Spectral Amplitude Estimators for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Sugato Chakravarty,et al.  Method for the subjective assessment of intermedi-ate quality levels of coding systems , 2001 .

[27]  Wei Zhang,et al.  Speech enhancement employing Laplacian-Gaussian mixture , 2005, IEEE Transactions on Speech and Audio Processing.

[28]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[29]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[30]  Eric Plourde,et al.  A family of Bayesian STSA estimators for the enhancement of speech with correlated frequency components , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[32]  Peter Kabal Windows for Transform Processing , 2005 .

[33]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[34]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Gerhard Schmidt,et al.  Model-Based Speech Enhancement , 2008 .

[36]  W. Davenport An Experimental Study of Speech‐Wave Probability Distributions , 1952 .

[37]  S. Gazor,et al.  Speech probability distribution , 2003, IEEE Signal Processing Letters.

[38]  Paul R. White,et al.  Speech spectral amplitude estimators using optimally shaped Gamma and Chi priors , 2009, Speech Commun..

[39]  Bin Chen,et al.  A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..

[40]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[41]  C. K. Yuen,et al.  Digital spectral analysis , 1979 .

[42]  Shlomo Dubnov,et al.  Generalized Likelihood Ratio Test for Voiced-Unvoiced Decision in Noisy Speech Using the Harmonic Model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  V. Kroupa,et al.  Digital spectral analysis , 1983, Proceedings of the IEEE.

[44]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[45]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[46]  John H. L. Hansen,et al.  Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[48]  Ronald E. Crochiere,et al.  A weighted overlap-add method of short-time Fourier analysis/Synthesis , 1980 .

[49]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[50]  W. Rudin Real and complex analysis, 3rd ed. , 1987 .

[51]  Philipos C. Loizou,et al.  Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.

[52]  José M. Tribolet,et al.  Statistical properties of an LPC distance measure , 1979, ICASSP.

[53]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[54]  Robert B. Dunn,et al.  Speech enhancement based on auditory spectral change , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.