End-to-End Training of a Large Vocabulary End-to-End Speech Recognition System
暂无分享,去创建一个
Dhananjaya N. Gowda | Chanwoo Kim | Abhinav Garg | Dhananjaya Gowda | Mehul Kumar | Kwangyoun Kim | Larry Heck | Sungsoo Kim | Jiyeon Kim | Kyungmin Lee | Changwoo Han | Eunhyang Kim | Minkyoo Shin | Shatrughan Singh | Larry Heck | Chanwoo Kim | Jiyeon Kim | Kwangyoun Kim | Shatrughan Singh | Kyungmin Lee | C. Han | Sungsoo Kim | Abhinav Garg | Mehul Kumar | Eunhyang Kim | Minkyoo Shin
[1] Tara N. Sainath,et al. A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Tom Bagby,et al. End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow , 2017, INTERSPEECH.
[4] Arun Narayanan,et al. Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models , 2017, INTERSPEECH.
[5] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[7] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[8] Jinyu Li,et al. Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.
[9] Richard M. Stern,et al. Two-microphone source separation algorithm based on statistical modeling of angle distributions , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Tomohiro Nakatani,et al. Integrating DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Jonathan Le Roux,et al. Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.
[13] Richard M. Stern,et al. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[14] Takuya Yoshioka,et al. Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Yongqiang Wang,et al. An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[16] Chanwoo Kim,et al. Sound source separation algorithm using phase difference and angle distribution modeling near the target , 2015, INTERSPEECH.
[17] Richard M. Stern,et al. Robust speech recognition using a Small Power Boosting algorithm , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.
[18] Richard M. Stern,et al. Robust speech recognition using temporal masking and thresholding algorithm , 2014, INTERSPEECH.
[19] Tara N. Sainath,et al. Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] E. A. Martin,et al. Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[23] Xiaodong Cui,et al. Data Augmentation for Deep Neural Network Acoustic Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[24] Daehyun Kim,et al. Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[25] Dhananjaya N. Gowda,et al. Power-Law Nonlinearity with Maximally Uniform Distribution Criterion for Improved Neural Network Training in Automatic Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[26] Mitch Weintraub,et al. Acoustic Modeling for Google Home , 2017, INTERSPEECH.
[27] Richard M. Stern,et al. Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction , 2009, INTERSPEECH.
[28] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[29] Hermann Ney,et al. Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.
[30] Hermann Ney,et al. Improved training of end-to-end attention models for speech recognition , 2018, INTERSPEECH.
[31] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[32] Richard M. Stern,et al. Automatic selection of thresholds for signal separation algorithms based on interaural delay , 2010, INTERSPEECH.
[33] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[34] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Richard M. Stern,et al. Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[36] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[37] Chanwoo Kim,et al. Multi-Task Multi-Resolution Char-to-BPE Cross-Attention Decoder for End-to-End Speech Recognition , 2019, INTERSPEECH.
[38] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[39] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[40] Hermann Ney,et al. Returnn: The RWTH extensible training framework for universal recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Richard M. Stern,et al. Signal Processing for Robust Speech Recognition , 1994, HLT.
[42] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[43] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[44] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.
[45] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[46] Reinhold Häb-Umbach,et al. Neural network based spectral mask estimation for acoustic beamforming , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Ankur Kumar,et al. Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[48] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[49] Richard M. Stern,et al. Sound Source Separation Using Phase Difference and Reliable Mask Selection Selection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] Richard M. Schwartz,et al. Two-Stage Data Augmentation for Low-Resourced Speech Recognition , 2016, INTERSPEECH.
[51] Arun Narayanan,et al. Toward Domain-Invariant Speech Recognition via Large Scale Training , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[52] Dhananjaya N. Gowda,et al. Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System , 2019, INTERSPEECH.
[53] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[54] Alex Waibel,et al. Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition , 1997 .
[55] John H. L. Hansen,et al. A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..
[56] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[57] Tara N. Sainath,et al. Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.