Streaming End-to-end Speech Recognition for Mobile Devices
暂无分享,去创建一个
Tara N. Sainath | Ding Zhao | Ian McGraw | Yonghui Wu | Khe Chai Sim | Tom Bagby | Alexander Gruenstein | Rohit Prabhavalkar | Qiao Liang | Bo Li | David Rybach | Anjuli Kannan | Golan Pundak | Ruoming Pang | Yanzhang He | Shuo-Yiin Chang | Deepti Bhatia | Kanishka Rao | Raziel Alvarez | Yuan Shangguan | Kanishka Rao | Yonghui Wu | Rohit Prabhavalkar | Anjuli Kannan | Bo Li | David Rybach | Ian McGraw | K. Sim | Alexander Gruenstein | Yanzhang He | Ruoming Pang | G. Pundak | Ding Zhao | Shuo-yiin Chang | Qiao Liang | Tom Bagby | R. Álvarez | Deepti Bhatia | Yuan Shangguan | A. Gruenstein
[1] Francoise Beaufays,et al. “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .
[2] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[4] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Hagen Soltau,et al. Reducing the computational complexity for whole word models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[7] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Tara N. Sainath,et al. Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search , 2018, INTERSPEECH.
[9] Sercan Ömer Arik,et al. Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting , 2017, INTERSPEECH.
[10] Wei Li,et al. Streaming small-footprint keyword spotting using sequence-to-sequence models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[11] Navdeep Jaitly,et al. An RNN Model of Text Normalization , 2017, INTERSPEECH.
[12] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[14] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..
[15] Brian Roark,et al. Composition-based on-the-fly rescoring for salient n-gram biasing , 2015, INTERSPEECH.
[16] Tara N. Sainath,et al. Deep Context: End-to-end Contextual Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[17] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[18] Brian Roark,et al. Bringing contextual information to google speech recognition , 2015, INTERSPEECH.
[19] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Keith B. Hall,et al. Sequence-based class tagging for robust transcription in ASR , 2015, INTERSPEECH.
[21] Ian McGraw,et al. Personalized speech recognition on mobile devices , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[23] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[24] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[25] Tara N. Sainath,et al. Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[26] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[28] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[29] Jordan Cohen,et al. Embedded speech recognition applications in mobile phones: Status, trends, and challenges , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[30] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[31] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[32] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[33] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[34] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[35] Alvarez Raziel,et al. End-to-end Streaming Keyword Spotting , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Tara N. Sainath,et al. Lower Frame Rate Neural Network Acoustic Models , 2016, INTERSPEECH.
[37] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[38] Rohit Prabhavalkar,et al. On the Efficient Representation and Execution of Deep Acoustic Models , 2016, INTERSPEECH.
[39] Tara N. Sainath,et al. Semi-supervised Training for End-to-end Models via Weak Distillation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Tanja Schultz,et al. Speechalator: two-way speech-to-speech translation on a consumer PDA , 2003, INTERSPEECH.
[41] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.