暂无分享,去创建一个
Ankur Bapna | Orhan Firat | Minh-Thang Luong | Maxim Krikun | Dmitry Lepikhin | Sneha Kudugunta | Yanping Huang | Orhan Firat | Minh-Thang Luong | Ankur Bapna | Dmitry Lepikhin | M. Krikun | Yanping Huang | Sneha Kudugunta
[1] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[2] Xian Li,et al. Deep Transformers with Latent Depth , 2020, NeurIPS.
[3] Noah A. Smith,et al. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.
[4] Naman Goyal,et al. BASE Layers: Simplifying Training of Large, Sparse Models , 2021, ICML.
[5] Victor O. K. Li,et al. Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.
[6] Ankur Bapna,et al. Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation , 2020, ACL.
[7] Jörg Tiedemann,et al. Emerging Language Spaces Learned From Massively Multilingual Corpora , 2018, DHN.
[8] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[9] Anton Gusev,et al. Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts , 2020, ArXiv.
[10] Quoc V. Le,et al. BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.
[11] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[12] Jörg Tiedemann,et al. Continuous multilinguality with language vectors , 2016, EACL.
[13] Zhe Zhao,et al. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts , 2018, KDD.
[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[15] Di He,et al. Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.
[16] Quoc V. Le,et al. CondConv: Conditionally Parameterized Convolutions for Efficient Inference , 2019, NeurIPS.
[17] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[18] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[19] Ankur Bapna,et al. Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation , 2021, ICLR.
[20] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[21] Feifei Zhai,et al. Three Strategies to Improve One-to-Many Multilingual Translation , 2018, EMNLP.
[22] Chris Hokamp,et al. Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models , 2019, WMT.
[23] Ankur Bapna,et al. Investigating Multilingual NMT Representations at Scale , 2019, EMNLP.
[24] Adithya Renduchintala,et al. Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders , 2022, EACL.
[25] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.
[26] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[27] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.
[28] Ed H. Chi,et al. SNR: Sub-Network Routing for Flexible Parameter Sharing in Multi-Task Learning , 2019, AAAI.
[29] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.
[30] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[31] Timothy T. Baldwin,et al. TRANSFER OF TRAINING: A REVIEW AND DIRECTIONS FOR FUTURE RESEARCH , 1988 .
[32] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.
[33] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[34] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[35] Jakob Uszkoreit,et al. Large Scale Parallel Document Mining for Machine Translation , 2010, COLING.
[36] Edouard Grave,et al. Depth-Adaptive Transformer , 2020, ICLR.
[37] Ankur Bapna,et al. Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.
[38] Rico Sennrich,et al. Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation , 2019, ArXiv.
[39] Holger Schwenk,et al. Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..
[40] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[41] Markus Freitag,et al. APE at Scale and Its Implications on MT Evaluation Biases , 2019, WMT.
[42] Rico Sennrich,et al. Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.
[43] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[44] Joachim Bingel,et al. Latent Multi-Task Architecture Learning , 2017, AAAI.