暂无分享,去创建一个
Liu Yang | Jinfeng Rao | Yi Tay | Sebastian Ruder | Dara Bahri | Mostafa Dehghani | Donald Metzler | Samira Abnar | Yikang Shen | Philip Pham | Sebastian Ruder | Yi Tay | Dara Bahri | Donald Metzler | M. Dehghani | Philip Pham | Sebastian Ruder | Dara Bahri | Samira Abnar | Liu Yang | J. Rao | Yikang Shen | Philip Pham | Mostafa Dehghani
[1] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.
[2] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[3] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[4] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[5] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[6] Willem Zuidema,et al. Quantifying Attention Flow in Transformers , 2020, ACL.
[7] Tim Rocktäschel,et al. Frustratingly Short Attention Spans in Neural Language Modeling , 2017, ICLR.
[8] Santiago Ontañón,et al. ETC: Encoding Long and Structured Data in Transformers , 2020, ArXiv.
[9] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[10] Thomas Wolf,et al. Transfer Learning in Natural Language Processing , 2019, NAACL.
[11] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[12] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[13] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[14] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[16] Cheng Li,et al. Semantic Text Matching for Long-Form Documents , 2019, WWW.
[17] Myle Ott,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.
[18] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.
[19] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[20] Pieter R Roelfsema,et al. Parallel and serial grouping of image elements in visual perception. , 2010, Journal of experimental psychology. Human perception and performance.
[21] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[22] Dustin Tran,et al. Image Transformer , 2018, ICML.
[23] Dragomir R. Radev,et al. The ACL anthology network corpus , 2009, Language Resources and Evaluation.
[24] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[25] Thomas Serre,et al. Disentangling neural mechanisms for perceptual grouping , 2019, ICLR.
[26] Liu Yang,et al. Sparse Sinkhorn Attention , 2020, ICML.
[27] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[28] Chris Dyer,et al. Learning to Discover, Ground and Use Words with Segmental Neural Language Models , 2018, ACL.
[29] Sandro Pezzelle,et al. The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.
[30] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[31] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[32] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[33] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[34] Thomas Serre,et al. Learning long-range spatial dependencies with horizontal gated-recurrent units , 2018, NeurIPS.
[35] Liu Yang,et al. Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching , 2020, ArXiv.
[36] Lucy J. Colwell,et al. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers , 2020, ArXiv.
[37] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[38] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[39] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[40] Jack W. Rae,et al. Do Transformers Need Deep Long-Range Memory? , 2020, ACL.
[41] Samuel R. Bowman,et al. ListOps: A Diagnostic Dataset for Latent Tree Learning , 2018, NAACL.
[42] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[43] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[44] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[45] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[46] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.
[47] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[48] W. Bruce Croft,et al. A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.
[49] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[50] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[51] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[52] Sara Hooker,et al. The hardware lottery , 2020, Commun. ACM.