暂无分享,去创建一个
[1] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[2] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[3] Amos J. Storkey,et al. How to train your MAML , 2018, ICLR.
[4] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[5] Ankur Bapna,et al. Training Deeper Neural Machine Translation Models with Transparent Attention , 2018, EMNLP.
[6] Le Song,et al. Meta Architecture Search , 2019, NeurIPS.
[7] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.
[8] Rico Sennrich,et al. Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.
[9] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[10] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.
[11] Luc Van Gool,et al. Branched Multi-Task Networks: Deciding what layers to share , 2019, BMVC.
[12] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[13] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[14] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.
[15] Graham Neubig,et al. When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? , 2018, NAACL.
[16] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[17] Yulia Tsvetkov,et al. Balancing Training for Multilingual Neural Machine Translation , 2020, ACL.
[18] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[19] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[20] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[21] Andrew J. Davison,et al. End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[23] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[24] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[25] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Ankur Bapna,et al. Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.
[28] Weihua Luo,et al. Multiscale Collaborative Deep Models for Neural Machine Translation , 2020, ACL.
[29] Artem Molchanov,et al. Generalized Inner Loop Meta-Learning , 2019, ArXiv.
[30] Rogério Schmidt Feris,et al. SpotTune: Transfer Learning Through Adaptive Fine-Tuning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Matthew Riemer,et al. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.
[32] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[33] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[34] Ankur Bapna,et al. Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.
[35] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.
[36] Rico Sennrich,et al. Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention , 2019, EMNLP.
[37] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[38] Larry S. Davis,et al. BlockDrop: Dynamic Inference Paths in Residual Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.