Gaussian density guided deep neural network for single-channel speech enhancement

Recently, the minimum mean squared error (MMSE) has been a benchmark of optimization criterion for deep neural network (DNN) based speech enhancement. In this study, a probabilistic learning framework to estimate the DNN parameters for single-channel speech enhancement is proposed. First, the statistical analysis shows that the prediction error vector at the DNN output well follows a unimodal density for each log-power spectral component. Accordingly, we present a maximum likelihood (ML) approach to DNN parameter learning by charactering the prediction error vector as a multivariate Gaussian density with a zero mean vector and an unknown covariance matrix. It is demonstrated that the proposed learning approach can achieve a better generalization capability than MMSE-based DNN learning for unseen noise types, which can significantly reduce the speech distortions in low SNR environments.

[1]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jun Du,et al.  Global variance equalization for improving deep neural network based speech enhancement , 2014, 2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP).

[3]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[4]  Jun Du,et al.  Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement , 2017, INTERSPEECH.

[5]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[6]  Paris Smaragdis,et al.  Experiments on deep learning for speech denoising , 2014, INTERSPEECH.

[7]  Jun Du,et al.  Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments , 2015, LVA/ICA.

[8]  Ye Li,et al.  Speech Enhancement for Non-Stationary Noise Environments , 2009, 2009 International Conference on Information Engineering and Computer Science.

[9]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[10]  Ji Wu,et al.  Denoising deep neural networks based voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Thomas Fang Zheng,et al.  Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[13]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[14]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Andrew L. Maas,et al.  RECURRENT NEURAL NETWORK FEATURE ENHANCEMENT: THE 2nd CHIME CHALLENGE , 2013 .

[16]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[17]  Jun Du,et al.  SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement , 2016, INTERSPEECH.

[18]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[19]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[21]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[22]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[23]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Jun Du,et al.  A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions , 2008, INTERSPEECH.

[25]  Yu Tsao,et al.  SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.

[26]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[27]  Björn W. Schuller,et al.  Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.

[28]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.