论文信息 - A network of deep neural networks for Distant Speech Recognition

A network of deep neural networks for Distant Speech Recognition

Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.

[1] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Gerald Friedland,et al. Audio concept classification with Hierarchical Deep Neural Networks , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[3] Te-Won Lee,et al. Blind Speech Separation , 2007, Blind Speech Separation.

[4] Tara N. Sainath,et al. Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[5] Jun Du,et al. Joint training of front-end and back-end deep neural networks for robust speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Gerhard Schmidt,et al. Speech and Audio Processing in Adverse Environments , 2008 .

[7] Tatsuya Kawahara,et al. Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-Target Learning for Noisy Speech Recognition , 2016, INTERSPEECH.

[8] Maurizio Omologo,et al. Contaminated speech training methods for robust DNN-HMM distant speech recognition , 2017, INTERSPEECH.

[9] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[10] Tomohiro Nakatani,et al. Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion , 2016, INTERSPEECH.

[11] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[12] Chengzhu Yu,et al. The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[13] Jasha Droppo,et al. Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14] Tomohiro Nakatani,et al. The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[16] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[17] Dong Yu,et al. Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[18] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19] Steve Renals,et al. Hybrid acoustic models for distant and multichannel large vocabulary speech recognition , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[20] Ulrike Goldschmidt. Speech And Audio Processing In Adverse Environments , 2016 .

[21] Mirco Ravanelli,et al. TANDEM-bottleneck feature combination using hierarchical Deep Neural Networks , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[22] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[23] Hermann Ney,et al. Hierarchical bottle neck features for LVCSR , 2010, INTERSPEECH.

[24] John R. Hershey,et al. Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks , 2015, INTERSPEECH.

[25] Thierry Dutoit,et al. Multi-task learning for speech recognition: an overview , 2016, ESANN.

[26] Maurizio Omologo,et al. Realistic Multi-Microphone Data Simulation for Distant Speech Recognition , 2016, INTERSPEECH.

[27] Arun Ross,et al. Microphone Arrays , 2009, Encyclopedia of Biometrics.

[28] Andrey Temko,et al. Acoustic Event Detection and Classification , 2007, Computers in the Human Interaction Loop.

[29] Homayoon Beigi,et al. Fundamentals of Speaker Recognition , 2011 .

[30] Yoshua Bengio,et al. Batch-normalized joint training for DNN-based distant speech recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Martin Karafiát,et al. Hierarchical neural net architectures for feature extraction in ASR , 2010, INTERSPEECH.

[33] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.

[34] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[35] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36] DeLiang Wang,et al. Joint noise adaptive training for robust automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37] Maurizio Omologo,et al. The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[38] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[39] Thomas Hain,et al. Using neural network front-ends on far field multiple microphones based speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40] Liang Lu,et al. Deep beamforming networks for multi-channel speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).