暂无分享,去创建一个
Tara N. Sainath | Ruoming Pang | Anmol Gulati | Yu Zhang | James Qin | Bo Li | W. Ronny Huang | Parisa Haghani | Min Ma | Bo Li | Yu Zhang | James Qin | Anmol Gulati | Ruoming Pang | W. R. Huang | Min Ma | Parisa Haghani
[1] John C. Wells,et al. Computer-coding the IPA: a proposed extension of SAMPA , 1995 .
[2] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[3] Yue Dong,et al. Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning , 2020, INTERSPEECH.
[4] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[5] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[6] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] E. Vajda. Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .
[8] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[9] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.
[10] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[11] Graham Neubig,et al. Balancing Training for Multilingual Neural Machine Translation , 2020, ACL.
[12] Gabriel Synnaeve,et al. MLS: A Large-Scale Multilingual Dataset for Speech Research , 2020, INTERSPEECH.
[13] Brian Kingsbury,et al. Advancing RNN Transducer Technology for Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Tara N. Sainath,et al. A Better and Faster end-to-end Model for Streaming ASR , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[17] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[18] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[19] Gabriel Synnaeve,et al. Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters , 2020, INTERSPEECH.
[20] Quoc V. Le,et al. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition , 2020, ArXiv.
[21] Tara N. Sainath,et al. Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Shuang Xu,et al. Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages , 2018, ArXiv.
[23] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Ronan Collobert,et al. Unsupervised Cross-lingual Representation Learning for Speech Recognition , 2020, Interspeech.
[25] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Yulia Tsvetkov,et al. Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models , 2020, ICLR.
[29] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[30] Tom M. Mitchell,et al. The Need for Biases in Learning Generalizations , 2007 .
[31] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[32] Hakan Bilen,et al. Knowledge Distillation for Multi-task Learning , 2020, ECCV Workshops.
[33] Simon King,et al. ASR - articulatory speech recognition , 2001, INTERSPEECH.
[34] David Yarowsky,et al. Massively Multilingual Adversarial Speech Recognition , 2019, NAACL.
[35] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[36] Tara N. Sainath,et al. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model , 2019, INTERSPEECH.
[37] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Worldbet,et al. ASCII Phonetic Symbols for the World s Languages Worldbet , 1994 .
[39] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[40] Yashesh Gaur,et al. On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition , 2020, INTERSPEECH.
[41] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[42] Ankur Bapna,et al. Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.
[43] Ekapol Chuangsuwanich,et al. Multilingual techniques for low resource automatic speech recognition , 2016 .