LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
暂无分享,去创建一个
Tao Gui | Xiao Wang | Shihan Dou | Rui Zheng | Zhiheng Xi | Yuhao Zhou | Songyang Gao | Jun Zhao | Wei Shen | Enyu Zhou | Xiaoran Fan | Qi Zhang | Xuanjing Huang | Yan Liu | Shiliang Pu | Jiang Zhu | Qi Zhang | Xiao Wang | Jiang Zhu
[1] Zhangyin Feng,et al. Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications , 2023, ArXiv.
[2] Lianmin Zheng,et al. S-LoRA: Serving Thousands of Concurrent LoRA Adapters , 2023, ArXiv.
[3] Tao Gui,et al. Orthogonal Subspace Learning for Language Model Continual Learning , 2023, EMNLP.
[4] Yuanshao Zhu,et al. MOELoRA: An MOE-based Parameter Efficient Fine-Tuning Method for Multi-task Medical Applications , 2023, ArXiv.
[5] A. Ustun,et al. Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning , 2023, ArXiv.
[6] Bill Yuchen Lin,et al. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition , 2023, ArXiv.
[7] Roberta Raileanu,et al. Challenges and Applications of Large Language Models , 2023, ArXiv.
[8] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.
[9] Hao Peng,et al. KoLA: Carefully Benchmarking World Knowledge of Large Language Models , 2023, ICLR.
[10] Z. Chen,et al. Lifelong Language Pretraining with Distribution-Specialized Experts , 2023, ICML.
[11] Omri Abend,et al. DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering , 2022, ACL.
[12] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.
[13] Hongsheng Li,et al. ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition , 2022, NeurIPS.
[14] Colin Raffel,et al. Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , 2022, NeurIPS.
[15] Li Dong,et al. On the Representation Collapse of Sparse Mixture of Experts , 2022, NeurIPS.
[16] Mona T. Diab,et al. A Review on Language Models as Knowledge Bases , 2022, ArXiv.
[17] Haitao Zheng,et al. Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models , 2022, ArXiv.
[18] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[19] Xipeng Qiu,et al. Black-Box Tuning for Language-Model-as-a-Service , 2022, ICML.
[20] Quoc V. Le,et al. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.
[21] Li Dong,et al. VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts , 2021, NeurIPS.
[22] Graham Neubig,et al. Towards a Unified View of Parameter-Efficient Transfer Learning , 2021, ICLR.
[23] Yang You,et al. Go Wider Instead of Deeper , 2021, AAAI.
[24] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[25] Carlos Riquelme,et al. Scaling Vision with Sparse Mixture of Experts , 2021, NeurIPS.
[26] Lidong Bing,et al. On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , 2021, ACL.
[27] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[28] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[29] Sebastian Riedel,et al. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets , 2020, EACL.
[30] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[31] Jianfeng Gao,et al. The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding , 2020, ACL.
[32] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[33] Zijian Wang,et al. Answering Complex Open-domain Questions Through Iterative Query Generation , 2019, EMNLP.
[34] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[35] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[36] Sung Ju Hwang,et al. Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data , 2019, ACL.
[37] Philipp Koehn,et al. The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English , 2019, EMNLP.
[38] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[39] Xiaodong Liu,et al. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension , 2018, ArXiv.
[40] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[41] Dan Roth,et al. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.
[42] Mitesh M. Khapra,et al. DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension , 2018, ACL.
[43] Neal J. Cohen,et al. A Closer Look at the Hippocampus and Memory , 2017, Trends in Cognitive Sciences.
[44] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[45] Zhiguo Wang,et al. Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.
[46] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[47] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[48] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[49] E. Rolls,et al. Computational analysis of the role of the hippocampus in memory , 1994, Hippocampus.
[50] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[51] Yihan Cao,et al. Instruction Mining: High-Quality Instruction Data Selection for Large Language Models , 2023, ArXiv.
[52] Dragomir R. Radev,et al. Crosslingual Generalization through Multitask Finetuning , 2023, ACL.
[53] Yejin Choi,et al. An Adversarial Winograd Schema Challenge at Scale , 2019 .