Modular dynamic deep denoising autoencoder for speech enhancement

Deep Denoising Autoencoder (DDAE) is an effective method for noise reduction and speech enhancement. However, a single DDAE with a fixed number of frames for neural network input cannot extract contextual information sufficiently. It has also less generalization in unknown SNRs (signal-to-noise-ratio) and the enhanced output has some residual noise. In this paper, we use a modular model in which three DDAEs with different window lengths are stacked. Experimental results showes that our proposed architecture, namely modular dynamic deep denoising autoencoder (MD-DDAE) provides superior performance in comparison with the traditional DDAE models in different noisy conditions.

[1]  Alex T. NELSONOregon Networks for Speech Enhancement , 1998 .

[2]  Hideki Kashioka,et al.  Speech restoration based on deep learning autoencoder with layer-wised pretraining , 2012, INTERSPEECH.

[3]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[4]  Yu Tsao,et al.  Ensemble modeling of denoising autoencoder for speech spectrum restoration , 2014, INTERSPEECH.

[5]  Jacob Benesty,et al.  New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[8]  Pascal Scalart,et al.  Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Yu Tsao,et al.  Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[10]  Changchun Bao,et al.  Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classification , 2014, Speech Commun..

[11]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[12]  Seyyed Ali Seyyedsalehi,et al.  A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks , 2015, Neurocomputing.

[13]  Bhiksha Raj,et al.  Complex recurrent neural networks for denoising speech signals , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14]  Ming Tu,et al.  Speech enhancement based on Deep Neural Networks with skip connections , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition: Fundamentals and Applications , 1995 .

[17]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[19]  Thomas Fang Zheng,et al.  Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[21]  Minje Kim,et al.  Collaborative Deep Learning for speech enhancement: A run-time model selection method using autoencoders , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).