Training Language Models with Memory Augmentation
暂无分享,去创建一个
[1] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[2] Minlie Huang,et al. LaMemo: Language Modeling with Look-Ahead Memory , 2022, NAACL.
[3] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[4] Markus N. Rabe,et al. Memorizing Transformers , 2022, ICLR.
[5] Diego de Las Casas,et al. Improving language models by retrieving from trillions of tokens , 2021, ICML.
[6] Tianwei Zhang,et al. GNN-LM: Language Modeling based on Global Contexts via GNN , 2021, ICLR.
[7] Michiel de Jong,et al. MENTION MEMORY : INCORPORATING TEXTUAL KNOWLEDGE INTO TRANSFORMERS THROUGH ENTITY MENTION ATTENTION , 2022, ICLR.
[8] Taylor Berg-Kirkpatrick,et al. Efficient Nearest Neighbor Language Models , 2021, EMNLP.
[9] Dani Yogatama,et al. End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering , 2021, NeurIPS.
[10] Jiajun Chen,et al. Adaptive Nearest Neighbor Machine Translation , 2021, ACL.
[11] Jason Weston,et al. Not All Memories are Created Equal: Learning to Forget by Expiring , 2021, ICML.
[12] Fei Sha,et al. ReadTwice: Reading Very Large Documents with Memories , 2021, NAACL.
[13] Roy Schwartz,et al. Random Feature Attention , 2021, ICLR.
[14] Tao Lei,et al. When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute , 2021, EMNLP.
[15] Dani Yogatama,et al. Adaptive Semiparametric Language Models , 2021, Transactions of the Association for Computational Linguistics.
[16] Mike Lewis,et al. Nearest Neighbor Machine Translation , 2020, ICLR.
[17] Lucy J. Colwell,et al. Rethinking Attention with Performers , 2020, ICLR.
[18] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[19] Nicola De Cao,et al. KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.
[20] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[21] Edouard Grave,et al. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.
[22] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[23] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[24] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[25] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[26] Claire Gardent,et al. Augmenting Transformers with KNN-Based Composite Memory for Dialog , 2020, TACL.
[27] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[28] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[29] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[30] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[31] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2019, ICLR.
[32] Sebastian Ruder,et al. Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.
[33] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[34] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[35] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[36] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[37] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[38] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[39] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[40] Peter J. Liu,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[41] Moustapha Cissé,et al. Unbounded cache model for online language modeling with open vocabulary , 2017, NIPS.
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[44] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[45] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.
[46] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[47] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[48] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[49] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..
[50] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[51] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[52] Barry Kelly,et al. Memories , 1997, The Ulster medical journal.
[53] André F. T. Martins,et al. ∞-former: Infinite Memory Transformer , 2022, ACL.
[54] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.