论文信息 - How many data points is a prompt worth?

How many data points is a prompt worth?

When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction. Proponents of prompting have argued that prompts provide a method for injecting task-specific guidance, which is beneficial in low-data regimes. We aim to quantify this benefit through rigorous testing of prompts in a fair setting: comparing prompted and head-based fine-tuning in equal conditions across many tasks and data sizes. By controlling for many sources of advantage, we find that prompting does indeed provide a benefit, and that this benefit can be quantified per task. Results show that prompting is often worth 100s of data points on average across classification tasks.

Alexander M. Rush | Teven Le Scao

[1] Nanyun Peng,et al. The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.

[2] Dan Roth,et al. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[3] Marius Mosbach,et al. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ArXiv.

[4] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.

[5] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[6] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[7] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[8] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[9] Zornitsa Kozareva,et al. SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[10] Judith Tonhauser,et al. The CommitmentBank: Investigating projection in naturally occurring discourse , 2019 .

[11] Helmut Schmid,et al. Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification , 2020, COLING.