FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization
暂无分享,去创建一个
Tara N. Sainath | Arun Narayanan | Jiahui Yu | Bo Li | Wei Han | Chung-Cheng Chiu | Yonghui Wu | Ruoming Pang | Anmol Gulati | Yanzhang He | Shuo-yiin Chang | Yonghui Wu | C. Chiu | Bo Li | Wei Han | Jiahui Yu | Yanzhang He | Anmol Gulati | Ruoming Pang | A. Narayanan | Shuo-yiin Chang
[1] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[2] Tara N. Sainath,et al. A Comparison of End-to-End Models for Long-Form Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[3] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Tara N. Sainath,et al. Low Latency Speech Recognition Using End-to-End Prefetching , 2020, INTERSPEECH.
[6] Andrew W. Senior,et al. Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.
[7] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[8] Tara N. Sainath,et al. Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition , 2017, INTERSPEECH.
[9] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[10] Tara N. Sainath,et al. Two-Pass End-to-End Speech Recognition , 2019, INTERSPEECH.
[11] Chengyi Wang,et al. Reducing the Latency of End-to-End Streaming Speech Recognition Models with a Scout Network , 2020 .
[12] Wei Li,et al. Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition , 2020, INTERSPEECH.
[13] Tom Bagby,et al. Sampled Connectionist Temporal Classification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Tara N. Sainath,et al. Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Arun Narayanan,et al. Toward Domain-Invariant Speech Recognition via Large Scale Training , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[16] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Tara N. Sainath,et al. Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling , 2020, ArXiv.
[19] Tara N. Sainath,et al. Emitting Word Timings with End-to-End Models , 2020, INTERSPEECH.
[20] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Kjell Schubert,et al. Transformer-Transducer: End-to-End Speech Recognition with Self-Attention , 2019, ArXiv.
[22] Tara N. Sainath,et al. Towards Fast and Accurate Streaming End-To-End ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Chengyi Wang,et al. Low Latency End-to-End Streaming Speech Recognition with a Scout Network , 2020, INTERSPEECH.
[24] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[25] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[26] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.