AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models
暂无分享,去创建一个
[1] Ari S. Morcos,et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , 2022, ICML.
[2] Colin Raffel,et al. Merging Models with Fisher-Weighted Averaging , 2021, NeurIPS.
[3] Amjad Almahairi,et al. UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning , 2021, ACL.
[4] T. Zhao,et al. Taming Sparsely Activated Transformer with Stochastic Experts , 2021, ICLR.
[5] Yoav Goldberg,et al. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models , 2021, ACL.
[6] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[7] Jason Weston,et al. Hash Layers For Large Sparse Models , 2021, NeurIPS.
[8] Xianyan Jia,et al. M6-T: Exploring Sparse Expert Models and Beyond Anonymous ACL submission , 2021 .
[9] Douwe Kiela,et al. True Few-Shot Learning with Language Models , 2021, NeurIPS.
[10] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[11] Naman Goyal,et al. BASE Layers: Simplifying Training of Large, Sparse Models , 2021, ICML.
[12] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[13] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[14] Armen Aghajanyan,et al. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , 2020, ACL.
[15] Behnam Neyshabur,et al. What is being transferred in transfer learning? , 2020, NeurIPS.
[16] Iryna Gurevych,et al. AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.
[17] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[18] Kilian Q. Weinberger,et al. Revisiting Few-sample BERT Fine-tuning , 2020, ICLR.
[19] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[20] Iryna Gurevych,et al. AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2020, EACL.
[21] Daniel M. Roy,et al. Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.
[22] Jimmy J. Lin,et al. What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning , 2019, ArXiv.
[23] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[24] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[25] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[26] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[27] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[30] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[31] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[33] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[34] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[35] Bo Pang,et al. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.
[36] Jianfeng Gao,et al. LiST: Lite Self-training Makes Efficient Few-shot Learners , 2021, ArXiv.
[37] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[38] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[39] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.
[40] Roy Bar-Haim,et al. The Second PASCAL Recognising Textual Entailment Challenge , 2006 .
[41] Claire Cardie,et al. Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.