暂无分享,去创建一个
[1] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[2] Brian McWilliams,et al. The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.
[3] Jiawei Han,et al. Understanding the Difficulty of Training Transformers , 2020, EMNLP.
[4] Omer Levy,et al. Improving Transformer Models by Reordering their Sublayers , 2020, ACL.
[5] Zhe Gan,et al. EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets , 2020, ACL.
[6] Liwei Wang,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[7] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[8] Ivan Vulic,et al. Unsupervised Cross-Lingual Representation Learning , 2019, ACL.
[9] Tom Goldstein,et al. GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training , 2021, NeurIPS.
[10] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.
[11] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[12] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[14] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[15] Chang Zhou,et al. CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.
[16] Noah A. Smith,et al. Shortformer: Better Language Modeling using Shorter Inputs , 2021, ACL.
[17] Maksims Volkovs,et al. Improving Transformer Optimization Through Better Initialization , 2020, ICML.
[18] Yejin Choi,et al. WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale , 2020, AAAI.
[19] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[20] Ilya Sutskever,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[21] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.
[22] Quoc V. Le,et al. Primer: Searching for Efficient Transformers for Language Modeling , 2021, NeurIPS.
[23] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[24] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.
[25] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[26] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[27] Nathanael Chambers,et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.
[28] Noam Shazeer,et al. GLU Variants Improve Transformer , 2020, ArXiv.
[29] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Peter Clark,et al. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.
[32] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[33] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .