Real-time Speech Enhancement Using an Efficient Convolutional Recurrent Network for Dual-microphone Mobile Phones in Close-talk Scenarios

In mobile speech communication, the quality and intelligibility of the received speech can be severely degraded by background noise if the far-end talker is in an adverse acoustic environment. Therefore, speech enhancement algorithms are typically integrated into mobile phones to remove background noise. In this paper, we propose a novel deep learning based framework for real-time speech enhancement on dual-microphone mobile phones in a close-talk scenario. It incorporates a convolutional recurrent network (CRN) with high computational efficiency. In addition, the framework amounts to a causal system, which is necessary for real-time processing on mobile phones. We find that the proposed approach consistently outperforms a deep neural network (DNN) based method, as well as two traditional methods for speech enhancement.

[1]  Jonathan Le Roux,et al.  Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[3]  Adnane Cherif,et al.  Speech enhancement in dual-microphone mobile phones using Kalman filter , 2016 .

[4]  Ahmad Akbari,et al.  Using power level difference for near field dual-microphone speech enhancement , 2009 .

[5]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[8]  Ángel M. Gómez,et al.  A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition , 2014, IberSPEECH.

[9]  DeLiang Wang,et al.  Deep Learning Based Binaural Speech Separation in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[11]  Jesper Jensen,et al.  MMSE based noise PSD tracking with low complexity , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Ángel M. Gómez,et al.  Deep Neural Network-Based Noise Estimation for Robust ASR in Dual-Microphone Smartphones , 2016, IberSPEECH.

[13]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[14]  Ángel M. Gómez,et al.  Unscented Transform-Based Dual-Channel Noise Estimation: Application to Speech Enhancement on Smartphones , 2018, 2018 41st International Conference on Telecommunications and Signal Processing (TSP).

[15]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[16]  DeLiang Wang,et al.  A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement , 2018, INTERSPEECH.

[17]  Yonghong Yan,et al.  A fast two-microphone noise reduction algorithm based on power level ratio for mobile phone , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[18]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  DeLiang Wang,et al.  A deep neural network for time-domain signal reconstruction , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Yung-Yue Chen Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm , 2018, Sensors.

[21]  Christophe Beaugeant,et al.  Noise reduction for dual-microphone mobile phones exploiting power level differences , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Zhong-Hua Fu,et al.  Dual-microphone noise reduction for mobile phone application , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.