Multi-Microphone Complex Spectral Mapping for Speech Dereverberation

This study proposes a multi-microphone complex spectral mapping approach for speech dereverberation on a fixed array geometry. In the proposed approach, a deep neural network (DNN) is trained to predict the real and imaginary (RI) components of direct sound from the stacked reverberant (and noisy) RI components of multiple microphones. We also investigate the integration of multi-microphone complex spectral mapping with beamforming and post-filtering. Experimental results on multi-channel speech dereverberation demonstrate the effectiveness of the proposed approach.

[1]  Deliang Wang,et al.  On Spatial Features for Supervised Speech Separation and its Application to Beamforming and Robust ASR , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jonathan Le Roux,et al.  Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.

[3]  Zhong-Qiu Wang,et al.  Deep Learning Based Target Cancellation for Speech Dereverberation , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  DeLiang Wang,et al.  Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Reinhold Häb-Umbach,et al.  BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[6]  Emmanuel Vincent,et al.  A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Zhong-Qiu Wang,et al.  Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Yu Tsao,et al.  Complex spectrogram enhancement by convolutional neural network with multi-metrics learning , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[9]  DeLiang Wang,et al.  Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Jen-Wei Huang,et al.  Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Dong Yu,et al.  Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Jonathan Le Roux,et al.  SDR – Half-baked or Well Done? , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Jon Barker,et al.  The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes , 2017, Comput. Speech Lang..

[14]  Nima Mesgarani,et al.  Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network , 2018, INTERSPEECH.

[15]  DeLiang Wang,et al.  Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  John R. Hershey,et al.  Multichannel End-to-end Speech Recognition , 2017, ICML.

[17]  GannotSharon,et al.  A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation , 2017 .

[18]  Ke Tan,et al.  Complex Spectral Mapping with a Convolutional Recurrent Network for Monaural Speech Enhancement , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Jonathan Le Roux,et al.  Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Zhong-Qiu Wang,et al.  Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation , 2018, INTERSPEECH.

[21]  R. Maas,et al.  A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.

[22]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Zhong-Qiu Wang,et al.  End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction , 2018, INTERSPEECH.

[24]  Tetsuji Ogawa,et al.  Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder , 2019, INTERSPEECH.

[25]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Stephan Gerlach,et al.  Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech , 2015, EURASIP J. Adv. Signal Process..

[27]  Guy J. Brown,et al.  Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  DeLiang Wang,et al.  Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Emanuel A. P. Habets,et al.  Time–Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks , 2019, IEEE Journal of Selected Topics in Signal Processing.

[30]  Stefan B. Williams,et al.  Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Zhong-Qiu Wang,et al.  All-Neural Multi-Channel Speech Enhancement , 2018, INTERSPEECH.

[32]  Soumitro Chakrabarty,et al.  Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals , 2018, IEEE Journal of Selected Topics in Signal Processing.

[33]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  DeLiang Wang,et al.  A speech enhancement algorithm by iterating single- and multi-microphone processing and its application to robust ASR , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Scott Wisdom,et al.  Differentiable Consistency Constraints for Improved Deep Speech Enhancement , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Simon Dixon,et al.  Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation , 2018, ISMIR.

[37]  Emanuel A. P. Habets,et al.  Time-Frequency Masking Based Online Speech Enhancement with Multi-Channel Data Using Convolutional Neural Networks , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[38]  DeLiang Wang,et al.  Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39]  Hakan Erdogan,et al.  Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Haizhou Li,et al.  A learning-based approach to direction of arrival estimation in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Zhong-Qiu Wang,et al.  Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments , 2019, INTERSPEECH.

[42]  Dong Yu,et al.  Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.