The MERL/MELCO/TUM system for the REVERB Challenge using Deep Recurrent Neural Network Feature Enhancement
暂无分享,去创建一个
Yuuki Tachioka | Jonathan Le Roux | John R. Hershey | Shinji Watanabe | Björn Schuller | Gerhard Rigoll | Felix Weninger | Jürgen T. Geiger | J. Le Roux | J. Hershey | Björn Schuller | F. Weninger | G. Rigoll | Shinji Watanabe | Yuuki Tachioka
[1] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .
[2] David G. Long,et al. Array signal processing , 1985, IEEE Trans. Acoust. Speech Signal Process..
[3] C. Burrus,et al. Array Signal Processing , 1989 .
[4] Steve Renals,et al. WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Mark J. F. Gales,et al. Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..
[7] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .
[8] Satoshi Nakamura,et al. Localization of multiple sound sources based on a CSP analysis with a microphone array , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[9] I. McCowan,et al. The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..
[10] Steve Young,et al. The HTK book version 3.4 , 2006 .
[11] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[12] Takashi Suzuki,et al. Sound source direction estimation based on subband peak-hold processing , 2009 .
[13] Kaisheng Yao,et al. A basis method for robust estimation of constrained MLLR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[15] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.
[16] Haihua Xu,et al. Minimum Bayes Risk decoding and system combination based on a recursion for edit distance , 2011, Comput. Speech Lang..
[17] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Yuuki Tachioka,et al. Direction of arrival estimation by cross-power spectrum phase analysis using prior distributions and voice activity detection information , 2012 .
[19] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.
[20] Yasuo Horiuchi,et al. Reverberant speech recognition based on denoising autoencoder , 2013, INTERSPEECH.
[21] Yongqiang Wang,et al. An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Yuuki Tachioka,et al. DISCRIMINATIVE METHODS FOR NOISE ROBUST SPEECH RECOGNITION: A CHIME CHALLENGE BENCHMARK , 2013 .
[23] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[24] Björn Schuller,et al. The TUM+TUT+KUL approach to the CHiME challenge 2013: Multi-stream ASR exploiting BLSTM networks and sparse NMF , 2013 .
[25] Andrew L. Maas,et al. RECURRENT NEURAL NETWORK FEATURE ENHANCEMENT: THE 2nd CHIME CHALLENGE , 2013 .
[26] Tomohiro Nakatani,et al. The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[27] Jon Barker,et al. The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[28] Björn W. Schuller,et al. Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory , 2013, Comput. Speech Lang..
[29] Yuuki Tachioka,et al. Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Björn W. Schuller,et al. Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments , 2014, Comput. Speech Lang..