Maximum likelihood convolutional beamformer for simultaneous denoising and dereverberation

This article describes a probabilistic formulation of a Weighted Power minimization Distortionless response convolutional beamformer (WPD). The WPD unifies a weighted prediction error based dereverberation method (WPE) and a minimum power distortionless response beamformer (MPDR) into a single convolutional beamformer, and achieves simultaneous dereverberation and denoising in an optimal way. However, the optimization criterion is obtained simply by combining existing criteria without any clear theoretical justification. This article presents a generative model and a probabilistic formulation of a WPD, and derives an optimization algorithm based on a maximum likelihood estimation. We also describe a method for estimating the steering vector of the desired signal by utilizing WPE within the WPD framework to provide an effective and efficient beamformer for denoising and dereverberation.

[1]  Heiga Zen,et al.  Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques , 2019, IEEE Signal Processing Magazine.

[2]  Tomohiro Nakatani,et al.  Probabilistic spatial dictionary based online adaptive beamforming for meeting recognition in noisy and reverberant environments , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Toon van Waterschoot,et al.  Multi-Channel Linear Prediction-Based Speech Dereverberation With Sparse Priors , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Reinhold Häb-Umbach,et al.  Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Jacob Benesty,et al.  Dereverberation with Differential Microphone Arrays and the Weighted-Prediction-Error Method , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[6]  Masahito Togami Multichannel online speech dereverberation under noisy environments , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[7]  Hiroshi Sawada,et al.  Blind Speech Separation in a Meeting Situation with Maximum SNR Beamformers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Mitch Weintraub,et al.  Acoustic Modeling for Google Home , 2017, INTERSPEECH.

[9]  Tomohiro Nakatani,et al.  A Unified Convolutional Beamformer for Simultaneous Denoising and Dereverberation , 2018, IEEE Signal Processing Letters.

[10]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Jonathan Le Roux,et al.  Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.

[12]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[13]  Tomohiro Nakatani,et al.  Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Reinhold Häb-Umbach,et al.  Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Masakiyo Fujimoto,et al.  Strategies for distant speech recognitionin reverberant environments , 2015, EURASIP J. Adv. Signal Process..

[16]  Biing-Hwang Juang,et al.  Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Tomohiro Nakatani,et al.  Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Tomohiro Nakatani,et al.  Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation , 2018, INTERSPEECH.

[20]  J. S. Bradley,et al.  On the importance of early reflections for speech in rooms. , 2003, The Journal of the Acoustical Society of America.

[21]  R. Maas,et al.  A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.

[22]  Harry L. Van Trees,et al.  Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory , 2002 .

[23]  Sharon Gannot,et al.  Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Xavier Anguera Miró,et al.  Acoustic Beamforming for Speaker Diarization of Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Tomohiro Nakatani,et al.  Distortionless Beamforming Optimized With $\ell _1$ -Norm Minimization , 2018, IEEE Signal Processing Letters.

[26]  H. Cox Resolving power and sensitivity to mismatch of optimum array processors , 1973 .

[27]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[28]  T. Lu,et al.  Inverses of 2 × 2 block matrices , 2002 .

[29]  Marc Moonen,et al.  Joint Multi-Microphone Speech Dereverberation and Noise Reduction Using Integrated Sidelobe Cancellation and Linear Prediction , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[30]  Daniele Giacobello,et al.  Speech Dereverberation Based on Convex Optimization Algorithms for Group Sparse Linear Prediction , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Tomohiro Nakatani,et al.  Adaptive dereverberation of speech signals with speaker-position change detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.