Multichannel Online Dereverberation Based on Spectral Magnitude Inverse Filtering

This paper addresses the problem of multichannel online dereverberation. The proposed method is carried out in the short-time Fourier transform (STFT) domain, and for each frequency band independently. In the STFT domain, the time-domain room impulse response is approximately represented by the convolutive transfer function (CTF). The multichannel CTFs are adaptively identified based on the cross-relation method, and using the recursive least square criterion. Instead of the complex-valued CTF convolution model, we use a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude, which is just a coarse approximation of the former model, but is shown to be more robust against the CTF perturbations. Based on this nonnegative model, we propose an online STFT magnitude inverse filtering method. The inverse filters of the CTF magnitude are formulated based on the multiple-input/output inverse theorem, and adaptively estimated based on the gradient descent criterion. Finally, the inverse filtering is applied to the STFT magnitude of the microphone signals, obtaining an estimate of the STFT magnitude of the source signal. Experiments regarding both speech enhancement and automatic speech recognition are conducted, which demonstrate that the proposed method can effectively suppress reverberation, even for the difficult case of a moving speaker.

[1]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[2]  T. Kailath,et al.  A least-squares approach to blind channel identification , 1995, IEEE Trans. Signal Process..

[3]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Stephan Weiss,et al.  Multichannel equalization in subbands , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[5]  Dimitris G. Manolakis,et al.  Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing , 1999 .

[6]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[7]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Jacob Benesty,et al.  A class of frequency-domain adaptive approaches to blind multichannel identification , 2003, IEEE Trans. Signal Process..

[9]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[10]  Alfred Mertins,et al.  Multi-Channel Room Impulse Response Shaping - A Study , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Marc Delcroix,et al.  Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations , 2007, EURASIP J. Adv. Signal Process..

[12]  Marc Delcroix,et al.  Precise Dereverberation Using Multichannel Linear Prediction , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Andy W. H. Khong,et al.  Adaptive inverse filtering of room acoustics , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[14]  Patrick A. Naylor,et al.  Equalization of Multichannel Acoustic Systems in Oversampled Subbands , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[16]  Tomohiro Nakatani,et al.  Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Tomohiro Nakatani,et al.  Adaptive dereverberation of speech signals with speaker-position change detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  R. Maas,et al.  Towards a Better Understanding of the Effect of Reverberation on Speech Recognition Performance , 2010 .

[19]  Alfred Mertins,et al.  Room Impulse Response Shortening/Reshaping With Infinity- and $p$ -Norm Optimization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Di Liu,et al.  A subspace-based adaptive approach for multichannel equalization of room acoustics , 2011 .

[24]  Iris Arweiler,et al.  The influence of spectral characteristics of early reflections on speech intelligibility. , 2011, The Journal of the Acoustical Society of America.

[25]  Tomohiro Nakatani,et al.  Dereverberation for reverberation-robust microphone arrays , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[26]  Stefan Goetze,et al.  Regularization for Partial Multichannel Equalization for Speech Dereverberation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Yuuki Tachioka,et al.  The MERL/MELCO/TUM system for the REVERB Challenge using Deep Recurrent Neural Network Feature Enhancement , 2014, ICASSP 2014.

[28]  Emanuel A. P. Habets,et al.  Subjective speech quality and speech intelligibility evaluation of single-channel dereverberation algorithms , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[29]  J. Hansen,et al.  Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications , 2014, INTERSPEECH.

[30]  Hong-Goo Kang,et al.  Online Speech Dereverberation Algorithm Based on Adaptive Multichannel Linear Prediction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Emanuel A. P. Habets,et al.  Robust Multichannel Dereverberation using Relaxed Multichannel Least Squares , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Tiago H. Falk,et al.  Updating the SRMR-CI Metric for Improved Intelligibility Prediction for Cochlear Implant Users , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[33]  Tiago H. Falk,et al.  An improved non-intrusive intelligibility metric for noisy and reverberant speech , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[34]  Emanuel A. P. Habets,et al.  Online Speech Dereverberation Using Kalman Filter and EM Algorithm , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[35]  Emanuel A. P. Habets,et al.  Multi-Microphone Speech Dereverberation and Noise Reduction Using Relative Early Transfer Functions , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  Sanjeev Khudanpur,et al.  JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[37]  Emanuel A. P. Habets,et al.  An online dereverberation algorithm for hearing aids with binaural cues preservation , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[38]  Walter Kellermann,et al.  Coherent-to-Diffuse Power Ratio Estimation for Dereverberation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39]  Emanuel A. P. Habets,et al.  Online Dereverberation for Dynamic Scenarios Using a Kalman Filter With an Autoregressive Model , 2016, IEEE Signal Processing Letters.

[40]  Simon Doclo,et al.  Speech Dereverberation Using Non-Negative Convolutive Transfer Function and Spectro-Temporal Modeling , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  Søren Holdt Jensen,et al.  Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[42]  R. Maas,et al.  A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.

[43]  Mitch Weintraub,et al.  Acoustic Modeling for Google Home , 2017, INTERSPEECH.

[44]  Hugo Van hamme,et al.  Joint Denoising and Dereverberation Using Exemplar-Based Sparse Representations and Decaying Norm Constraint , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[45]  Radu Horaud,et al.  An em algorithm for audio source separation based on the convolutive transfer function , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[46]  Arun Narayanan,et al.  Adaptive Multichannel Dereverberation for Automatic Speech Recognition , 2017, INTERSPEECH.

[47]  Jing Xia,et al.  Effects of reverberation and noise on speech intelligibility in normal-hearing and aided hearing-impaired listeners. , 2018, The Journal of the Acoustical Society of America.

[48]  Emanuël A. P. Habets,et al.  Linear Prediction-Based Online Dereverberation and Noise Reduction Using Alternating Kalman Filters , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[49]  Radu Horaud,et al.  Multisource Mint Using Convolutive Transfer Function , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Kai Chen,et al.  RLS-Based Adaptive Dereverberation Tracing Abrupt Position Change of Target Speaker , 2018, 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM).

[51]  Tomohiro Nakatani,et al.  Frame-Online DNN-WPE Dereverberation , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[52]  Abigail A. Kressner,et al.  The impact of reverberation on speech intelligibility in cochlear implant recipients. , 2018, The Journal of the Acoustical Society of America.

[53]  Reinhold Haeb-Umbach,et al.  NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing , 2018, ITG Symposium on Speech Communication.

[54]  DeLiang Wang,et al.  A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions. , 2018, The Journal of the Acoustical Society of America.

[55]  Radu Horaud,et al.  Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction Based on Convolutive Transfer Function , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[56]  Radu Horaud,et al.  Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.