Extending Context Window of Large Language Models via Positional Interpolation
暂无分享,去创建一个
[1] Martin Jaggi,et al. Landmark Attention: Random-Access Infinite Context Length for Transformers , 2023, ArXiv.
[2] Myle Ott,et al. PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel , 2023, ArXiv.
[3] Noah D. Goodman,et al. Learning to Compress Prompts with Gist Tokens , 2023, ArXiv.
[4] David C. Uthus,et al. CoLT5: Faster Long-Range Transformers with Conditional Computation , 2023, ArXiv.
[5] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[6] Jamie Callan,et al. Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer , 2022, EMNLP.
[7] Jane A. Yu,et al. Few-shot Learning with Retrieval Augmented Language Models , 2022, J. Mach. Learn. Res..
[8] M. Burtsev,et al. Recurrent Memory Transformer , 2022, NeurIPS.
[9] Daniel Y. Fu,et al. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.
[10] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[11] Omer Levy,et al. Transformer Language Models without Positional Encodings Still Learn Positional Information , 2022, EMNLP.
[12] Markus N. Rabe,et al. Memorizing Transformers , 2022, ICLR.
[13] Omer Levy,et al. SCROLLS: Standardized CompaRison Over Long Language Sequences , 2022, EMNLP.
[14] M. Zaharia,et al. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction , 2021, NAACL.
[15] Noah A. Smith,et al. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.
[16] Jure Leskovec,et al. Combiner: Full Attention Transformer with Sparse Computation Cost , 2021, NeurIPS.
[17] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[18] Jianlin Su,et al. RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.
[19] Shuyang Cao,et al. Efficient Attentions for Long Document Summarization , 2021, NAACL.
[20] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.
[21] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[22] A. Geramifard,et al. Memformer: A Memory-Augmented Transformer for Sequence Modeling , 2020, AACL/IJCNLP.
[23] Lucy J. Colwell,et al. Rethinking Attention with Performers , 2020, ICLR.
[24] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[25] Christopher Potts,et al. Relevance-guided Supervision for OpenQA with ColBERT , 2020, Transactions of the Association for Computational Linguistics.
[26] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[27] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[28] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[29] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[30] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[31] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[32] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[33] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[34] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[35] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[36] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[37] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[38] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[39] André F. T. Martins,et al. ∞-former: Infinite Memory Transformer , 2022, ACL.