A perceptually motivated LP residual estimator in noisy and reverberant environments

Abstract Both reverberation and additive noise can degrade the quality of recorded speech and thus should be suppressed simultaneously. Previous studies have shown that the generalized singular value decomposition (GSVD) has the capability of suppressing the additive noise effectively, but it is not often applied for speech dereverberation since reverberation is considered to be convolutive as well as colored noise. Recently, we revealed that late reverberation is also additive and relatively white interference component in the linear prediction (LP) residual domain. To suppress both late reverberation and additive noise, we have proposed an optimal filter for LP residual estimator (LPRE) based on a constrained minimum mean square error (CMMSE) by using GSVD in single channel speech enhancement, where the algorithm is referred as CMMSE-GSVD-LPRE. Experimental results have shown a better performance of the CMMSE-GSVD-LPRE than spectral subtraction methods, but some residual noise and reverberation components are still audible and annoying. To solve this problem, this paper incorporates the masking properties of the human auditory system in the LP residual domain to further suppress these residual noise and reverberation components while reducing speech distortion at the same time. Various simulation experiments are conducted, and the results show an improved performance of the proposed algorithm. Experimental results with speech recorded in noisy and reverberant environments further confirm the effectiveness of the proposed algorithm in real-world environments.

[1]  Nam C. Phamdo,et al.  Signal/noise KLT based approach for enhancing speech degraded by colored noise , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Tomohiro Nakatani,et al.  Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition , 2012, IEEE Signal Process. Mag..

[3]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[4]  Athina P. Petropulu,et al.  Cepstrum-based deconvolution for speech dereverberation , 1996, IEEE Trans. Speech Audio Process..

[5]  Philip A. Nelson,et al.  Inverse Filter of Sound Reproduction Systems Using Regularization , 1997 .

[6]  Marc Moonen,et al.  A multi-channel subband generalized singular value decomposition approach to speech enhancement , 2002, Eur. Trans. Telecommun..

[7]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[8]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[9]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features–A Theoretically Consistent Approach , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  George Cybenko,et al.  The Numerical Stability of the Levinson-Durbin Algorithm for Toeplitz Systems of Equations , 1980 .

[11]  P. Jeffrey Bloom,et al.  Evaluation of two-input speech dereverberation techniques , 1982, ICASSP.

[12]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[13]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[14]  Inverse filtering of room acoustics - Acoustics, Speech and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Tr , 2004 .

[15]  Patrick A. Naylor,et al.  Speech Dereverberation , 2010 .

[16]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[17]  Thomas Esch,et al.  Model-Based Dereverberation Preserving Binaural Cues , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Marc Delcroix,et al.  Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations , 2007, EURASIP J. Adv. Signal Process..

[20]  DeLiang Wang,et al.  A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[22]  Sabine Van Huffel,et al.  Enhanced resolution based on minimum variance estimation and exponential data modeling , 1993, Signal Process..

[23]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[24]  Takuya Yoshioka,et al.  Integrated Speech Enhancement Method Using Noise Suppression and Dereverberation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[26]  Emanuel A. P. Habets,et al.  Multi-channel speech dereverberation based on a statistical model of late reverberation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[27]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[28]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[29]  Marc Moonen,et al.  GSVD-based optimal filtering for single and multimicrophone speech enhancement , 2002, IEEE Trans. Signal Process..

[30]  Benoît Champagne,et al.  A perceptual signal subspace approach for speech enhancement in colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[32]  Björn W. Schuller,et al.  Real-Time Speech Recognition in a Multi-talker Reverberated Acoustic Scenario , 2011, ICIC.

[33]  Jian Li,et al.  A Constrained MMSE LP Residual Estimator for Speech Dereverberation in Noisy Environments , 2014, IEEE Signal Processing Letters.

[34]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[35]  Marc Moonen,et al.  Subspace Methods for Multimicrophone Speech Dereverberation , 2003, EURASIP J. Adv. Signal Process..

[36]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[37]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Peter Kabal,et al.  Reverberant speech enhancement using cepstral processing , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[39]  S. R. Mahadeva Prasanna,et al.  Enhancement of noisy speech by temporal and spectral processing , 2011, Speech Commun..

[40]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[41]  R. Maas,et al.  Towards a Better Understanding of the Effect of Reverberation on Speech Recognition Performance , 2010 .

[42]  DeLiang Wang,et al.  Learning spectral mapping for speech dereverberation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Biing-Hwang Juang,et al.  Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Tomohiro Nakatani,et al.  The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[45]  Peter Vary,et al.  A blind speech enhancement algorithm for the suppression of late reverberation and noise , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  John Mourjopoulos On the variation and invertibility of room impulse response functions , 1985 .

[47]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[48]  Jont B. Allen,et al.  Invertibility of a room impulse response , 1979 .

[49]  Joachim Thiemann Acoustic Noise Suppression for Speech Signals using Auditory Masking Eects , 2001 .

[50]  Søren Holdt Jensen,et al.  Reduction of general broad-band noise in speech by truncated QSVD: Implementation aspects , 1995 .

[51]  Lin-Shan Lee,et al.  A Perceptually Constrained GSVD-Based Approach for Enhancing Speech Corrupted by Colored Noise , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  Rodney A. Kennedy,et al.  Equalization in an acoustic reverberant environment: robustness results , 2000, IEEE Trans. Speech Audio Process..

[53]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[54]  Saeed Gazor,et al.  An adaptive KLT approach for speech enhancement , 2001, IEEE Trans. Speech Audio Process..

[55]  D. Ward,et al.  ON THE USE OF LINEAR PREDICTION FOR DEREVERBERATION OF SPEECH , 2003 .

[56]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[57]  Peter Jax,et al.  A novel psychoacoustically motivated audio enhancement algorithm preserving background noise characteristics , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[58]  P. Krishnamoorthy,et al.  Reverberant Speech Enhancement by Temporal and Spectral Processing , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[59]  VargaAndrew,et al.  Assessment for automatic speech recognition II , 1993 .

[60]  Tomohiro Nakatani,et al.  Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[61]  Xiaodong Li,et al.  Speech enhancement based on the structure of noise power spectral density , 2010, 2010 18th European Signal Processing Conference.

[62]  Deepen Sinha,et al.  Low bit rate transparent audio compression using adapted wavelets , 1993, IEEE Trans. Signal Process..

[63]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[64]  P. Jeffrey Bloom Evaluation of a dereverberation process by normal and impaired listeners , 1980, ICASSP.