Guided spectrogram filtering for speech dereverberation

Abstract Guided filtering is a computationally efficient and powerful technique used in image processing applications, such as edge-preserving smoothing, details enhancing and single image dehazing. In this paper, we propose a novel single channel speech dereverberation method using guided spectrogram filtering by considering a speech spectrogram as an image. The proposed method requires neither room acoustic parameter estimation nor late reverberant spectral variance estimation. Objective test results show the validity of the guided spectrogram filtering method for speech dereverberation. Compared with state-of-the-art speech dereverberation methods, the proposed method has better performance in terms of perceptual evaluation of speech quality (PESQ), speech-to-reverberation modulation energy ratio (SRMR) and short-time objective intelligibility (STOI) in most cases.

[1]  Walter Kellermann,et al.  Coherent-to-Diffuse Power Ratio Estimation for Dereverberation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[3]  Chen Youyuan,et al.  A robust interaural time differences estimation and dereverberation algorithm based on the coherence function , 2018 .

[4]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  DeLiang Wang,et al.  A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Jian Sun,et al.  Single image haze removal using dark channel prior , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[9]  DeLiang Wang,et al.  An Auditory Scene Analysis Approach to Monaural Speech Segregation , 2006 .

[10]  Rui Wang,et al.  Speech dereverberation method based on spectral subtraction and spectral line enhancement , 2016 .

[11]  T. Aaron Gulliver,et al.  Single-Microphone Early and Late Reverberation Suppression in Noisy Speech , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[13]  Marc Moonen,et al.  GSVD-based optimal filtering for single and multimicrophone speech enhancement , 2002, IEEE Trans. Signal Process..

[14]  Bayya Yegnanarayana,et al.  Enhancement of reverberant speech using LP residual signal , 2000, IEEE Trans. Speech Audio Process..

[15]  Mandar Chitre,et al.  Spectrogram denoising and automated extraction of the fundamental frequency variation of dolphin whistles. , 2008, The Journal of the Acoustical Society of America.

[16]  Ben Pinkowski Principal component analysis of speech spectrogram images , 1997, Pattern Recognit..

[17]  Yi Hu,et al.  Effects of early and late reflections on intelligibility of reverberated speech by cochlear implant listeners. , 2014, The Journal of the Acoustical Society of America.

[18]  Akira Ogawa,et al.  Reduction of Noise in Speech Signals through Image Processing using the Spectrogram , 2006 .

[19]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[20]  Jian Li,et al.  A Constrained MMSE LP Residual Estimator for Speech Dereverberation in Noisy Environments , 2014, IEEE Signal Processing Letters.

[21]  Tao Zhang,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[23]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[24]  J. S. Bradley,et al.  On the importance of early reflections for speech in rooms. , 2003, The Journal of the Acoustical Society of America.

[25]  Pawan K. Ajmera,et al.  Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram , 2011, Pattern Recognit..

[26]  Paul Dalsgaard,et al.  Robust Speech Recognition by Nonlocal Means Denoising Processing , 2008, IEEE Signal Processing Letters.

[27]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[28]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.