论文信息 - Modular dynamic deep denoising autoencoder for speech enhancement

Modular dynamic deep denoising autoencoder for speech enhancement

Deep Denoising Autoencoder (DDAE) is an effective method for noise reduction and speech enhancement. However, a single DDAE with a fixed number of frames for neural network input cannot extract contextual information sufficiently. It has also less generalization in unknown SNRs (signal-to-noise-ratio) and the enhanced output has some residual noise. In this paper, we use a modular model in which three DDAEs with different window lengths are stacked. Experimental results showes that our proposed architecture, namely modular dynamic deep denoising autoencoder (MD-DDAE) provides superior performance in comparison with the traditional DDAE models in different noisy conditions.

[1] Alex T. NELSONOregon. Networks for Speech Enhancement , 1998 .

[2] Hideki Kashioka,et al. Speech restoration based on deep learning autoencoder with layer-wised pretraining , 2012, INTERSPEECH.

[3] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[4] Yu Tsao,et al. Ensemble modeling of denoising autoencoder for speech spectrum restoration , 2014, INTERSPEECH.

[5] Jacob Benesty,et al. New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7] Yi Hu,et al. Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[8] Pascal Scalart,et al. Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Yu Tsao,et al. Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[10] Changchun Bao,et al. Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classification , 2014, Speech Commun..

[11] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[12] Seyyed Ali Seyyedsalehi,et al. A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks , 2015, Neurocomputing.

[13] Bhiksha Raj,et al. Complex recurrent neural networks for denoising speech signals , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14] Ming Tu,et al. Speech enhancement based on Deep Neural Networks with skip connections , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16] Jean-Claude Junqua,et al. Robustness in Automatic Speech Recognition: Fundamentals and Applications , 1995 .

[17] Pascal Scalart,et al. Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[19] Thomas Fang Zheng,et al. Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[21] Minje Kim,et al. Collaborative Deep Learning for speech enhancement: A run-time model selection method using autoencoders , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).