Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing
暂无分享,去创建一个
Young-Yoon Lee | Gowtham P. Vadisetti | Dhananjaya N. Gowda | Chanwoo Kim | Junmo Park | Abhinav Garg | Kwangyoun Kim | Youngho Han | Aditya Jayasimha | Dhananjaya Gowda | Jiyeon Kim | Kyungbo Min | Sooyeon Kim | Sichen Jin | Chanwoo Kim | Jiyeon Kim | Kwangyoun Kim | Aditya Jayasimha | Abhinav Garg | Junmo Park | Sichen Jin | Young-Yoon Lee | Youngho Han | Sooyeon Kim | Kyung-Joong Min
[1] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Daehyun Kim,et al. Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[3] Ankur Kumar,et al. Utterance Invariant Training for Hybrid Two-Pass End-to-End Speech Recognition , 2020, INTERSPEECH.
[4] Dongsoo Lee,et al. DeepTwist: Learning Model Compression via Occasional Weight Distortion , 2018, ArXiv.
[5] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.
[6] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[7] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.
[8] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[9] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[10] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[11] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.
[12] Yonghong Yan,et al. Online Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2019, INTERSPEECH.
[13] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Ankur Kumar,et al. Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios , 2020, INTERSPEECH.
[15] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Bongshin Lee,et al. Voice typing: a new speech interaction model for dictation on touchscreen devices , 2012, CHI.
[17] Chanwoo Kim,et al. Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition , 2020, INTERSPEECH.
[18] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[20] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Hermann Ney,et al. RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition , 2018, ACL.
[22] Dhananjaya N. Gowda,et al. Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System , 2019, INTERSPEECH.
[23] Quoc V. Le,et al. A Neural Transducer , 2015, 1511.04868.
[24] Rohit Prabhavalkar,et al. On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition , 2019, INTERSPEECH.
[25] Li Shuangfeng,et al. TensorFlow Lite: On-Device Machine Learning Framework , 2020 .
[26] Dhananjaya N. Gowda,et al. End-to-End Training of a Large Vocabulary End-to-End Speech Recognition System , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[27] Ankur Kumar,et al. Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Sathish Reddy Indurthi,et al. Small Energy Masking for Improved Neural Network Training for End-To-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[30] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Yu Zhang,et al. Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM , 2017, INTERSPEECH.
[33] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[34] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.