Speech Enhancement Based on Cepstral Mapping and Deep Neural Networks

In this paper, we present three strategies to achieve speech enhancement, which is based on Cepstral Mapping and Deep Neural Networks (DNN). Firstly, we apply DNN to directly predict the clean speech Cepstral feature given noisy Cepstral input. Then, by waveform reconstruction, we can obtain desired clean speech. Comparing with the method of directly mapping log-power spectral (LPS), our method is able to be more effective to recover speech harmonic structure and gain the higher speech quality. Additionally, we also utilize DNN to estimate ideal Wiener filter by giving noisy Cepstral input. Finally, a fusion framework is proposed to acquire enhanced speech signal, which combines Cepstral feature mapping and Wiener filter. Experiments show that the proposed algorithms are able to achieve the state-of-the-art performance in improving the quality and intelligibility of noisy speech.

[1]  Qi He,et al.  Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Qi He,et al.  Codebook-based speech enhancement using Markov process and speech-presence probability , 2015, INTERSPEECH.

[3]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Geoffrey Zweig,et al.  An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[5]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[6]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[7]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[8]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[9]  Dirk Van Compernolle,et al.  A family of MLP based nonlinear spectral estimators for noise reduction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  DeLiang Wang,et al.  On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Changchun Bao,et al.  Speech enhancement with weighted denoising auto-encoder , 2013, INTERSPEECH.

[13]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[15]  Jacob Benesty,et al.  Spectral Enhancement Methods , 2009 .

[16]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[17]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[18]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[19]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[20]  Changchun Bao,et al.  Speech enhancement based on a few shapes of speech spectrum , 2014, 2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP).

[21]  W. Bastiaan Kleijn,et al.  Codebook driven short-term predictor parameter estimation for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[25]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[26]  Henning Puder,et al.  Improving Robustness of Codebook-Based Noise Estimation Approaches With Delta Codebooks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Changchun Bao,et al.  Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classification , 2014, Speech Commun..