Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
暂无分享,去创建一个
Tara N. Sainath | Chung-Cheng Chiu | Ruoming Pang | Tara N. Sainath | Jiahui Yu | Wei Han | Anmol Gulati | Bo Li | Yonghui Wu | Yonghui Wu | C. Chiu | Jiahui Yu | Anmol Gulati | Ruoming Pang | Wei Han | Bo Li
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[3] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[5] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Wei Li,et al. Monotonic Infinite Lookback Attention for Simultaneous Machine Translation , 2019, ACL.
[7] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.
[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[9] Jonathan Le Roux,et al. Triggered Attention for End-to-end Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Wei Li,et al. Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition , 2020, INTERSPEECH.
[11] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[12] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[13] Tara N. Sainath,et al. Joint Endpointing and Decoding with End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Yonghong Yan,et al. Transformer-Based Online CTC/Attention End-To-End Speech Recognition Architecture , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Tara N. Sainath,et al. Low Latency Speech Recognition Using End-to-End Prefetching , 2020, INTERSPEECH.
[16] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).
[17] Samy Bengio,et al. An Online Sequence-to-Sequence Model Using Partial Conditioning , 2015, NIPS.
[18] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[19] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.
[20] Navdeep Jaitly,et al. Towards Better Decoding and Language Model Integration in Sequence to Sequence Models , 2016, INTERSPEECH.
[21] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.
[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[23] Jonathan Le Roux,et al. Streaming Automatic Speech Recognition with the Transformer Model , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[25] Tara N. Sainath,et al. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model , 2019, INTERSPEECH.
[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Trevor Darrell,et al. Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[28] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[29] Tara N. Sainath,et al. Two-Pass End-to-End Speech Recognition , 2019, INTERSPEECH.
[30] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[31] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[32] Daniel S. Park,et al. Efficient Knowledge Distillation for RNN-Transducer Models , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Kjell Schubert,et al. Transformer-Transducer: End-to-End Speech Recognition with Self-Attention , 2019, ArXiv.
[35] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[36] Tara N. Sainath,et al. Cascaded Encoders for Unifying Streaming and Non-Streaming ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Gang Liu,et al. An Online Attention-based Model for Speech Recognition , 2018, INTERSPEECH.
[38] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[40] Shinji Watanabe,et al. Towards Online End-to-end Transformer Automatic Speech Recognition , 2019, ArXiv.
[41] Shinji Watanabe,et al. Streaming Transformer Asr With Blockwise Synchronous Beam Search , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[42] Xiao Chen,et al. Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition , 2020, INTERSPEECH.
[43] Tara N. Sainath,et al. Towards Fast and Accurate Streaming End-To-End ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Quoc V. Le,et al. BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models , 2020, ECCV.
[45] Arun Narayanan,et al. Toward Domain-Invariant Speech Recognition via Large Scale Training , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[46] Ning Xu,et al. Slimmable Neural Networks , 2018, ICLR.
[47] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[48] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[49] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[50] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[51] Shinji Watanabe,et al. Streaming Transformer ASR with Blockwise Synchronous Inference , 2020, ArXiv.
[52] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Ding Zhao,et al. Dynamic Sparsity Neural Networks for Automatic Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Tara N. Sainath,et al. A Comparison of End-to-End Models for Long-Form Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[57] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[58] Jiahui Yu,et al. AutoSlim: Towards One-Shot Architecture Search for Channel Numbers , 2019 .
[59] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).
[60] Thomas S. Huang,et al. Universally Slimmable Networks and Improved Training Techniques , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[61] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Tianqi Chen,et al. Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.
[63] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[64] Chengyi Wang,et al. Low Latency End-to-End Streaming Speech Recognition with a Scout Network , 2020, INTERSPEECH.
[65] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[66] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[67] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.