RealFormer: Transformer Likes Residual Attention
暂无分享,去创建一个
Ruining He | Joshua Ainslie | Anirudh Ravula | Bhargav Kanagal | Ruining He | J. Ainslie | Anirudh Ravula | Bhargav Kanagal
[1] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.
[2] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[3] Santiago Ontañón,et al. ETC: Encoding Long and Structured Data in Transformers , 2020, ArXiv.
[4] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[5] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[6] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[7] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[8] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[10] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[11] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[12] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[13] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[14] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[15] Jiawei Han,et al. Understanding the Difficulty of Training Transformers , 2020, EMNLP.
[16] Chenyan Xiong,et al. Open Domain Web Keyphrase Extraction Beyond Language Modeling , 2019, EMNLP.
[17] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.
[18] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[19] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[20] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[21] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[22] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[23] Maksims Volkovs,et al. Improving Transformer Optimization Through Better Initialization , 2020, ICML.
[24] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[25] Rico Sennrich,et al. Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention , 2019, EMNLP.
[26] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[27] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[28] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[29] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[32] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.
[33] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[34] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[35] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[36] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[37] Liwei Wang,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[38] Hao Tian,et al. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.
[39] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..