Bayesian MMSE Filtering of Noisy Speech by SNR Marginalization With Global PSD Priors

MMSE filtering of speech with additive noise and latent speech power-spectral density (PSD) is addressed. This problem is strong in single-channel speech enhancement and restricts the utility of stationary Wiener filters or other statistical estimators based on PSDs. The issue typically manifests itself in residual noise after filtering, despite the availability of the noise PSD. Our paper therefore incorporates the latent speech PSD state via marginalization into the MMSE estimation framework of complex speech spectral amplitudes. The hence involved joint posterior distribution of the complex speech amplitude and speech PSD, conditioned on just the noisy observations, is then resolved in the Bayesian sense into a speech and a speech-PSD posterior. The latter is expressed via the local data likelihood and a hyper-prior of the local speech PSD or a-priori SNR—i.e., a global distribution across the entire speech signal. Marginalization, in this way, turns into expectation over a latent Wiener filter, such that explicit estimation of local a-priori SNR is eliminated. The local input data in the form of the a-posteriori SNR and the global SNR value as a descriptor of the overall speech-in-noise condition turns out sufficient to control our resulting MMSE spectral gain function, and, potentially, can be provided much easier than the latent and time-varying a-priori SNR. An improved balance of residual noise and speech quality in the enhancement of noisy speech is demonstrated by objective experimental evaluation.

[1]  Timo Gerkmann,et al.  On Speech Enhancement Under PSD Uncertainty , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[3]  Pejman Mowlaee,et al.  Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice , 2016 .

[4]  Jae S. Lim,et al.  Enhancement and bandwidth compression of noisy speech by estimation of speech and its model parameters. , 1978 .

[5]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[6]  John G. Proakis,et al.  Digital signal processing (3rd ed.): principles, algorithms, and applications , 1996 .

[7]  Don H. Johnson,et al.  Statistical Signal Processing , 2009, Encyclopedia of Biometrics.

[8]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[9]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[10]  Rainer Martin,et al.  Estimation of Subband Speech Correlations for Noise Reduction via MVDR Processing , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Ivan Tashev,et al.  Data driven suppression rule for speech enhancement , 2013, 2013 Information Theory and Applications Workshop (ITA).

[12]  Anatolii A. Logunov,et al.  Analytic functions of several complex variables , 1965 .

[13]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[14]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[15]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[16]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[17]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[18]  Indrajit Chakrabarti,et al.  Global soft decision based speech enhancement using voiced-unvoiced uncertainty and harmonic phase decomposition technique , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[19]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[20]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  James M. Kates,et al.  Digital hearing aids. , 2008, Harvard health letter.

[22]  Rainer Martin,et al.  Cepstral Smoothing of Spectral Filter Gains for Speech Enhancement Without Musical Noise , 2007, IEEE Signal Processing Letters.

[23]  Nam Soo Kim,et al.  Spectral enhancement based on global soft decision , 2000, IEEE Signal Processing Letters.

[24]  T. Esch MODIFIED KALMAN FILTER EXPLOITING INTERFRAME CORRELATION OF SPEECH AND NOISE MAGNITUDES , 2008 .

[25]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[26]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Edward J. Wegman,et al.  Statistical Signal Processing , 1985 .

[28]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[29]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[30]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[31]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[32]  Gerald Enzner,et al.  Robust MMSE filtering for single-microphone speech enhancement , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[34]  Dong Yu,et al.  Automatic Speech Recognition , 2015 .

[35]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  D. Markle,et al.  Hearing Aids , 1936, The Journal of Laryngology & Otology.

[37]  Jacob Benesty,et al.  Speech Enhancement: A Signal Subspace Perspective , 2014 .

[38]  Simon J. Godsill,et al.  Efficient Alternatives to the Ephraim and Malah Suppression Rule for Audio Signal Enhancement , 2003, EURASIP J. Adv. Signal Process..

[39]  Eberhard Freitag,et al.  Analytic Functions of Several Complex Variables , 2011 .

[40]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[41]  Jacob Benesty,et al.  Noise Reduction in Speech Processing , 2009 .

[42]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[43]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[44]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[45]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[46]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[47]  M. K. Hasan,et al.  A modified a priori SNR for speech enhancement using spectral subtraction rules , 2004, IEEE Signal Processing Letters.

[48]  Sven Nordholm,et al.  Spectral subtraction using reduced delay convolution and adaptive averaging , 2001, IEEE Trans. Speech Audio Process..

[49]  Jacob Benesty,et al.  A Multi-Frame Approach to the Frequency-Domain Single-Channel Noise Reduction Problem , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[50]  W. Marsden I and J , 2012 .

[51]  Pascal Scalart,et al.  Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[53]  Weiping Zhu,et al.  Recent Developments in Speech Enhancement in the Short-Time Fourier Transform Domain , 2016, IEEE Circuits and Systems Magazine.

[54]  Philipos C. Loizou,et al.  Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.

[55]  Peter Vary,et al.  Noise suppression by spectral magnitude estimation —mechanism and theoretical limits— , 1985 .

[56]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[57]  Tim Fingscheidt,et al.  Environment-Optimized Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[58]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[59]  Jessica J. M. Monaghan,et al.  Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users , 2017, Hearing Research.

[60]  DeLiang Wang,et al.  A classification based approach to speech segregation. , 2012, The Journal of the Acoustical Society of America.

[61]  Fei Xie,et al.  A comparative study of speech detection methods , 1997, EUROSPEECH.

[62]  E. Hänsler,et al.  Acoustic Echo and Noise Control: A Practical Approach , 2004 .

[63]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[64]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[65]  Jesper Jensen,et al.  A data-driven approach to optimizing spectral speech enhancement methods for various error criteria , 2007, Speech Commun..

[66]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[67]  Susanto Rahardja,et al.  /spl beta/-order MMSE spectral amplitude estimation for speech enhancement , 2005, IEEE Transactions on Speech and Audio Processing.

[68]  DeLiang Wang,et al.  Speech intelligibility in background noise with ideal binary time-frequency masking. , 2009, The Journal of the Acoustical Society of America.

[69]  Timo Gerkmann,et al.  STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[70]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .