Preserving In-Context Learning ability in Large Language Model Fine-tuning
暂无分享,去创建一个
Cho-Jui Hsieh | I. Dhillon | Si Si | Surinder Kumar | M. Lukasik | Yihan Wang | Daliang Li | Felix Yu | Felix X. Yu | Michal Lukasik | Cho-Jui Hsieh | Sanjiv Kumar
[1] Pascale Fung,et al. Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..
[2] M. Shanahan,et al. Faithful Reasoning Using Large Language Models , 2022, ArXiv.
[3] Andrew Kyle Lampinen,et al. Data Distributional Properties Drive Emergent In-Context Learning in Transformers , 2022, NeurIPS.
[4] Reza Yazdani Aminabadi,et al. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.
[5] M. Lewis,et al. MetaICL: Learning to Learn In Context , 2021, NAACL.
[6] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[7] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[8] Fei Huang,et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners , 2021, ICLR.
[9] Aitor Lewkowycz,et al. Effect of scale on catastrophic forgetting in neural networks , 2022, ICLR.
[10] Zhilin Yang,et al. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks , 2021, ArXiv.
[11] Pooyan Jamshidi,et al. Pretrained Language Models are Symbolic Mathematics Solvers too! , 2021, ArXiv.
[12] Nikhil Ramesh,et al. Entity-Based Knowledge Conflicts in Question Answering , 2021, EMNLP.
[13] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[14] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[15] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.
[16] Claire Cardie,et al. WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization , 2020, FINDINGS.
[17] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[18] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[19] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[20] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[21] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[22] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[23] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[26] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.
[27] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[28] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.