Deep Multitask Acoustic Echo Cancellation

Acoustic echo cancellation or suppression methods aim to suppress the echo originated from acoustic coupling between loudspeakers and microphones. Conventional approaches estimate echo using adaptive filtering. Due to the nonlinearities in the acoustic path of far-end signal, further post-processing is needed to attenuate these nonlinear components. In this paper, we propose a novel architecture based on deep gated recurrent neural networks to estimate the near-end signal from the microphone signal. The proposed architecture is trained using multitask learning to learn the auxiliary task of estimating the echo in order to improve the main task of estimating the clean near-end speech signal. Experimental results show that our proposed deep learning based method outperforms the existing methods for unseen speakers in terms of the echo return loss enhancement (ERLE) for single-talk periods and the perceptual evaluation of speech quality (PESQ) score for double-talk periods.

[1]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[3]  Tiago H. Falk,et al.  Speech Dereverberation With Context-Aware Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  E. Hänsler,et al.  Acoustic Echo and Noise Control: A Practical Approach , 2004 .

[5]  Christof Faller,et al.  Suppressing acoustic echo in a spectral envelope space , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Walter Kellermann,et al.  Spectral feature-based nonlinear residual echo suppression , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[7]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[8]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[13]  Nam Soo Kim,et al.  DNN-based residual echo suppression , 2015, INTERSPEECH.

[14]  Pascal Scalart,et al.  Comparison of three post-filtering algorithms for residual acoustic echo reduction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  G. K.,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2017 .

[16]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[17]  Rainer Martin,et al.  The echo shaping approach to acoustic echo control , 1996, Speech Commun..

[18]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[19]  Rainer Martin,et al.  Combined acoustic echo control and noise reduction for hands-free telephony , 1998, Signal Process..

[20]  DeLiang Wang,et al.  Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios , 2018, INTERSPEECH.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[23]  Gerald Enzner,et al.  State-Space Frequency-Domain Adaptive Filtering for Nonlinear Acoustic Echo Cancellation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.