LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
暂无分享,去创建一个
S. Shi | Bo Li | X. Chu | Lin Zhang | Longteng Zhang | Shaohuai Shi
[1] Bill Yuchen Lin,et al. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition , 2023, ArXiv.
[2] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.
[3] Sherin Muckatira,et al. Stack More Layers Differently: High-Rank Training Through Low-Rank Updates , 2023, ArXiv.
[4] S. Shi,et al. Evaluation and Optimization of Gradient Compression for Distributed Deep Learning , 2023, ArXiv.
[5] Eric P. Xing,et al. One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning , 2023, ArXiv.
[6] Luke Zettlemoyer,et al. QLoRA: Efficient Finetuning of Quantized LLMs , 2023, NeurIPS.
[7] Myle Ott,et al. PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel , 2023, ArXiv.
[8] Hyung Won Chung,et al. UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining , 2023, ICLR.
[9] Alexander W. Bukharin,et al. Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning , 2023, ICLR.
[10] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[11] Noah A. Smith,et al. Self-Instruct: Aligning Language Models with Self-Generated Instructions , 2022, ACL.
[12] Omer Levy,et al. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor , 2022, ACL.
[13] M. Lewis,et al. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale , 2022, ArXiv.
[14] Daniel Y. Fu,et al. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.
[15] Colin Raffel,et al. Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , 2022, NeurIPS.
[16] Lawrence C. McAfee,et al. Reducing Activation Recomputation in Large Transformer Models , 2022, MLSys.
[17] Noah A. Smith,et al. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , 2022, EMNLP.
[18] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[19] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[20] Reza Yazdani Aminabadi,et al. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.
[21] M. Lewis,et al. 8-bit Optimizers via Block-wise Quantization , 2021, ICLR.
[22] Minlie Huang,et al. PPT: Pre-trained Prompt Tuning for Few-shot Learning , 2021, ACL.
[23] Owain Evans,et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.
[24] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[25] Yoav Goldberg,et al. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models , 2021, ACL.
[26] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[27] Rami Al-Rfou,et al. ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models , 2021, Transactions of the Association for Computational Linguistics.
[28] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[29] Olatunji Ruwase,et al. ZeRO-Offload: Democratizing Billion-Scale Model Training , 2021, USENIX ATC.
[30] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.
[31] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.
[32] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.
[33] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[34] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[35] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[36] P. Abbeel,et al. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.
[37] Samyam Rajbhandari,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[38] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[39] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[40] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.
[41] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[42] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[43] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[44] Jin-Hyuk Hong,et al. Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information , 2018, AAAI.
[45] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[46] Oren Etzioni,et al. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.
[47] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[48] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[49] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[50] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[51] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[52] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[53] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[54] Jean-Marc Jézéquel,et al. Measuring Models , 2009 .