论文信息 - Late Reverberation Suppression Using Recurrent Neural Networks with Long Short-Term Memory

Late Reverberation Suppression Using Recurrent Neural Networks with Long Short-Term Memory

Human speech is usually distorted by room reverberation. These corruptions degrade speech quality and intelligibility, especially under a long reverberation time, and they also pose a serious problem for many speech-related applications such as automatic speech recognition. In this paper, we propose a supervised speech dereverberation algorithm that models late reverberation using a recurrent neural network (RNN) with long short-term memory (LSTM). By taking advantage of LSTM's ability to capture a long history, late reverberation can be effectively removed by the proposed approach. Systematic evaluations indicate that our approach improves the quality of reverberant speech in a wide range of reverberant conditions. Moreover, the proposed system is a causal system, which can be applied in real-time applications.

Tao Zhang | DeLiang Wang | Yan Zhao | Buye Xu

[1] DeLiang Wang,et al. Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2] K. Helfer,et al. Hearing loss, aging, and speech perception in reverberation and noise. , 1990, Journal of speech and hearing research.

[3] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[4] J.-M. Boucher,et al. A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[5] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[6] Yuuki Tachioka,et al. Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] A. Nabelek,et al. Reverberant overlap- and self-masking in consonant identification. , 1989, The Journal of the Acoustical Society of America.

[8] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9] G. K.,et al. Learning Spectral Mapping for Speech Dereverberation and Denoising , 2017 .

[10] Jont B. Allen,et al. Image method for efficiently simulating small‐room acoustics , 1976 .

[11] DeLiang Wang,et al. A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[13] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.

[14] Gaël Richard,et al. Single channel reverberation suppression based on sparse linear prediction , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Biing-Hwang Juang,et al. Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] Yi Hu,et al. Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[18] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[19] Tomohiro Nakatani,et al. Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[20] IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[21] Tomohiro Nakatani,et al. Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition , 2012, IEEE Signal Process. Mag..

[22] Chin-Hui Lee,et al. A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.