A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency
暂无分享,去创建一个
Tara N. Sainath | Anjuli Kannan | Arun Narayanan | Zhifeng Chen | Ke Hu | Ding Zhao | Bo Li | Yu Zhang | Rohit Prabhavalkar | Wei Li | David Garcia | David Rybach | Chung-Cheng Chiu | Yonghui Wu | Golan Pundak | Ruoming Pang | Trevor Strohman | Antoine Bruguier | Qiao Liang | Yanzhang He | Cal Peyser | Raziel Alvarez | Minho Jin | Shuo-yiin Chang | Yuan Shangguan | Ian McGraw | Mirko Visontai | Alex Gruenstein | Yash Sheth
[1] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[4] Matt Shannon,et al. Improved End-of-Query Detection for Streaming Speech Recognition , 2017, INTERSPEECH.
[5] Yifan Gong,et al. Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[6] Bo Li,et al. A Unified Endpointer Using Multitask and Multidomain Training , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[7] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[8] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[9] Jürgen Schmidhuber,et al. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.
[10] Tara N. Sainath,et al. Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[12] Tara N. Sainath,et al. Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition , 2017, INTERSPEECH.
[13] Fadi Biadsy,et al. Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals , 2017, INTERSPEECH.
[14] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[15] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[16] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[17] Tara N. Sainath,et al. Lower Frame Rate Neural Network Acoustic Models , 2016, INTERSPEECH.
[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[19] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[22] Tara N. Sainath,et al. Two-Pass End-to-End Speech Recognition , 2019, INTERSPEECH.
[23] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[24] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Tara N. Sainath,et al. Joint Endpointing and Decoding with End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[27] Tara N. Sainath,et al. Recognizing Long-Form Speech Using Streaming End-to-End Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Yiming Wang,et al. Far-Field ASR Without Parallel Data , 2016, INTERSPEECH.