论文信息 - Gaussian density guided deep neural network for single-channel speech enhancement

Gaussian density guided deep neural network for single-channel speech enhancement

Recently, the minimum mean squared error (MMSE) has been a benchmark of optimization criterion for deep neural network (DNN) based speech enhancement. In this study, a probabilistic learning framework to estimate the DNN parameters for single-channel speech enhancement is proposed. First, the statistical analysis shows that the prediction error vector at the DNN output well follows a unimodal density for each log-power spectral component. Accordingly, we present a maximum likelihood (ML) approach to DNN parameter learning by charactering the prediction error vector as a multivariate Gaussian density with a zero mean vector and an unknown covariance matrix. It is demonstrated that the proposed learning approach can achieve a better generalization capability than MMSE-based DNN learning for unseen noise types, which can significantly reduce the speech distortions in low SNR environments.

[1] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Jun Du,et al. Global variance equalization for improving deep neural network based speech enhancement , 2014, 2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP).

[3] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[4] Jun Du,et al. Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement , 2017, INTERSPEECH.

[5] A.V. Oppenheim,et al. Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[6] Paris Smaragdis,et al. Experiments on deep learning for speech denoising , 2014, INTERSPEECH.

[7] Jun Du,et al. Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments , 2015, LVA/ICA.

[8] Ye Li,et al. Speech Enhancement for Non-Stationary Noise Environments , 2009, 2009 International Conference on Information Engineering and Computer Science.

[9] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[10] Ji Wu,et al. Denoising deep neural networks based voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Thomas Fang Zheng,et al. Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12] Israel Cohen,et al. Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[13] Herman J. M. Steeneken,et al. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[14] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15] Andrew L. Maas,et al. RECURRENT NEURAL NETWORK FEATURE ENHANCEMENT: THE 2nd CHIME CHALLENGE , 2013 .

[16] Jacob Benesty,et al. Speech Enhancement , 2010 .

[17] Jun Du,et al. SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement , 2016, INTERSPEECH.

[18] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[19] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20] Alan V. Oppenheim,et al. All-pole modeling of degraded speech , 1978 .

[21] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .

[22] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[23] Philipos C. Loizou,et al. A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24] Jun Du,et al. A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions , 2008, INTERSPEECH.

[25] Yu Tsao,et al. SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.

[26] S. Kullback,et al. Information Theory and Statistics , 1959 .

[27] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.

[28] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.