Recognizing Long-Form Speech Using Streaming End-to-End Models
暂无分享,去创建一个
Tara N. Sainath | Arun Narayanan | Rohit Prabhavalkar | David Rybach | Chung-Cheng Chiu | Trevor Strohman | C. Chiu | Rohit Prabhavalkar | David Rybach | Trevor Strohman | A. Narayanan
[1] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[3] Arun Narayanan,et al. Toward Domain-Invariant Speech Recognition via Large Scale Training , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[4] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[5] Hermann Ney,et al. Improved training of end-to-end attention models for speech recognition , 2018, INTERSPEECH.
[6] Sanjeev Khudanpur,et al. Investigation of transfer learning for ASR using LF-MMI trained neural networks , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[7] Quoc V. Le,et al. A Neural Transducer , 2015, 1511.04868.
[8] Tara N. Sainath,et al. Two-Pass End-to-End Speech Recognition , 2019, INTERSPEECH.
[9] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[10] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[11] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[15] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[16] Tara N. Sainath,et al. A Comparison of End-to-End Models for Long-Form Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[17] Hairong Liu,et al. Exploring neural transducers for end-to-end speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[19] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[20] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[21] Joseph Alexander Altsheler. The Last of the Chiefs: A Story of the Great Sioux War , 2007 .
[22] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[23] Yu Zhang,et al. Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM , 2017, INTERSPEECH.
[24] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[25] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[26] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Jinyu Li,et al. Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.
[28] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[29] Sepp Hochreiter,et al. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..
[30] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[31] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[32] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[33] Yiming Wang,et al. Far-Field ASR Without Parallel Data , 2016, INTERSPEECH.
[34] Frederick Jelinek,et al. Continuous speech recognition , 1977, SGAR.
[35] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[36] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[37] Bhuvana Ramabhadran,et al. Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, INTERSPEECH.
[38] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[40] Bruce T. Lowerre,et al. The HARPY speech recognition system , 1976 .
[41] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[42] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.