A Constrained MMSE LP Residual Estimator for Speech Dereverberation in Noisy Environments

After revealing that both late reverberation and noise are additive interference components in the residual domain, this paper proposes to suppress these additive interference components by using a constrained minimum mean square error linear prediction (LP) residual estimator, where the optimal filter can be obtained by the generalized singular value decomposition. We propose to estimate the LP residuals for both late reverberation and noise continuously, which is based on the non-VAD related noise power spectral density estimator and the incessant late reverberant spectral variance estimator. The non-intrusive objective measure and the PESQ show that the proposed algorithm is better than traditional LP residual-based algorithms and spectral subtraction-based algorithms.

[1]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[2]  L L Elliott,et al.  Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. , 1977, The Journal of the Acoustical Society of America.

[3]  Rainer Martin,et al.  An evaluation of noise power spectral density estimation algorithms in adverse acoustic environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  DeLiang Wang,et al.  A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[6]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[7]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[8]  Simon Doclo,et al.  Multi-microphone noise reduction and dereverberation techniques for speech applications , 2003 .

[9]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[10]  Lin-Shan Lee,et al.  A Perceptually Constrained GSVD-Based Approach for Enhancing Speech Corrupted by Colored Noise , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Wen Jin,et al.  Speech enhancement by residual domain constrained optimization , 2006, Speech Commun..

[12]  S. R. Mahadeva Prasanna,et al.  Speech enhancement using excitation source information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Bayya Yegnanarayana,et al.  Enhancement of reverberant speech using LP residual signal , 2000, IEEE Trans. Speech Audio Process..

[14]  Yi Hu,et al.  Speech enhancement based on wavelet thresholding the multitaper spectrum , 2004, IEEE Transactions on Speech and Audio Processing.

[15]  Hynek Hermansky,et al.  Speech enhancement using linear prediction residual , 1999, Speech Commun..

[16]  Yi Hu,et al.  A subspace approach for enhancing speech corrupted by colored noise , 2002, IEEE Signal Processing Letters.

[17]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[18]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[19]  Patrick A. Naylor,et al.  Speech Dereverberation , 2010 .

[20]  T. Aaron Gulliver,et al.  Single-Microphone Early and Late Reverberation Suppression in Noisy Speech , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[22]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[23]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Eric Moulines,et al.  A diphone synthesis system based on time-domain prosodic modifications of speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[25]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[27]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[28]  D. Ward,et al.  ON THE USE OF LINEAR PREDICTION FOR DEREVERBERATION OF SPEECH , 2003 .

[29]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[30]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Marc Moonen,et al.  GSVD-based optimal filtering for single and multimicrophone speech enhancement , 2002, IEEE Trans. Signal Process..

[32]  Tomohiro Nakatani,et al.  Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Eap Emanuël Habets Single- and multi-microphone speech dereverberation using spectral enhancement , 2007 .

[34]  Xiaodong Li,et al.  A cepstrum-based preprocessing and postprocessing for speech enhancement in adverse environments , 2013 .

[35]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[36]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .