论文信息 - Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach - 字舞流文

Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

Large Language Models have demonstrated remarkable few-shot performance, but the performance can be sensitive to the selection of few-shot instances. We propose PATRON, a new method that uses prompt-based uncertainty estimation for data selection for pre-trained language model fine-tuning under cold-start scenarios, i.e., no initial labeled data are available. In PATRON, we design (1) a prompt-based uncertainty propagation approach to estimate the importance of data points and (2) a partition-then-rewrite (PTR) strategy to promote sample diversity when querying for annotations. Experiments on six text classification datasets show that PATRON outperforms the strongest cold-start data selection baselines by up to 6.9%. Besides, with 128 labels only, PATRON achieves 91.0% and 92.1% of the fully supervised performance based on vanilla fine-tuning and prompt-based learning respectively. Our implementation of PATRON is available at \url{https://github.com/yueyu1030/Patron}.

Jieyu Zhang | Chao Zhang | Chao Zhang | Jiaming Shen | Yue Yu | Rongzhi Zhang | Ran Xu | Yue Yu | Chao Zhang

[1] Carl Yang,et al. Neighborhood-Regularized Self-Training for Learning with Few Labels , 2023, AAAI.

[2] Maosong Sun,et al. Prompt Tuning for Discriminative Pre-trained Language Models , 2022, FINDINGS.

[3] Rehan Ahmad,et al. Unsupervised Data Selection for Speech Recognition with Contrastive Loss Ratios , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Thomas Müller,et al. Active Few-Shot Learning with FASL , 2022, NLDB.

[5] R. Aharonov,et al. Cluster & Tune: Boost Cold Start Performance in Text Classification , 2022, ACL.

[6] Yue Yu,et al. Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning , 2022, ACL.

[7] Alexander J. Ratner,et al. A Survey on Programmatic Weak Supervision , 2022, ArXiv.

[8] D. Weinshall,et al. Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets , 2022, ICML.

[9] Mengjie Zhao,et al. LMTurk: Few-Shot Learners as Crowdsourcing Workers , 2021, NAACL-HLT.

[10] Maosong Sun,et al. OpenPrompt: An Open-source Framework for Prompt-learning , 2021, ACL.

[11] Fei Huang,et al. Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners , 2021, ICLR.

[12] Luke Zettlemoyer,et al. Noisy Channel Language Model Prompting for Few-Shot Text Classification , 2021, ACL.

[13] Maosong Sun,et al. Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification , 2021, ACL.

[14] Martin Potthast,et al. Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers , 2021, FINDINGS.

[15] Loïc Barrault,et al. On the Importance of Effectively Adapting Pretrained Language Models for Active Learning , 2021, ACL.

[16] Benjamin Van Durme,et al. Adapting Coreference Resolution Models through Active Learning , 2021, ACL.

[17] Barbara Plank,et al. Cartography Active Learning , 2021, Conference on Empirical Methods in Natural Language Processing.

[18] Nikolaos Aletras,et al. Active Learning by Acquiring Contrastive Examples , 2021, EMNLP.

[19] Kyle Lo,et al. FLEX: Unifying Evaluation for Few-Shot NLP , 2021, NeurIPS.

[20] Ruoxi Jia,et al. Zero-Round Active Learning , 2021, ArXiv.

[21] Vera Demberg,et al. On Training Instance Selection for Few-Shot Neural Text Generation , 2021, ACL.

[22] Li Fei-Fei,et al. Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering , 2021, ACL.

[23] Jiawei Han,et al. Training ELECTRA Augmented with Multi-word Selection , 2021, FINDINGS.

[24] Danqi Chen,et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[25] Colin Raffel,et al. Improving and Simplifying Pattern Exploiting Training , 2021, EMNLP.

[26] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[27] Alexander Panchenko,et al. Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates , 2021, EACL.

[28] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.

[29] T. Zhao,et al. Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach , 2020, NAACL.

[30] Vishrav Chaudhary,et al. Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[31] Hinrich Schütze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[32] Kilian Q. Weinberger,et al. Revisiting Few-sample BERT Fine-tuning , 2020, ICLR.

[33] Timo Schick,et al. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[34] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[35] Stella X. Yu,et al. Unsupervised Data Selection for Data-Centric Semi-Supervised Learning , 2021 .

[36] Jianfeng Gao,et al. LiST: Lite Self-training Makes Efficient Few-shot Learners , 2021, ArXiv.

[37] Akshay L Chandra,et al. On Initial Pools for Deep Active Learning , 2020, Preregister@NeurIPS.

[38] Eyal Shnarch,et al. Active Learning for BERT: An Empirical Study , 2020, EMNLP.

[39] Yiming Yang,et al. On the Sentence Embeddings from BERT for Semantic Textual Similarity , 2020, EMNLP.

[40] Hsuan-Tien Lin,et al. Cold-start Active Learning through Self-Supervised Language Modeling , 2020, EMNLP.

[41] Chao Zhang,et al. Weakly-Supervised Text Classification Using Label Names Only , 2020, EMNLP.

[42] Chao Zhang,et al. SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup , 2020, EMNLP.

[43] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[44] Weinan Zhang,et al. Active Sentence Learning by Adversarial Uncertainty Sampling in Discrete Space , 2020, FINDINGS.

[45] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[46] Roee Aharoni,et al. Unsupervised Domain Clusters in Pretrained Language Models , 2020, ACL.

[47] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[48] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[49] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[50] Zachary Chase Lipton,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.

[51] John Langford,et al. Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[52] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[53] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[54] Takuya Akiba,et al. Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[55] Jiawei Han,et al. Weakly-Supervised Hierarchical Text Classification , 2018, AAAI.

[56] Xinyun Chen,et al. Learning to Perform Local Rewriting for Combinatorial Optimization , 2018, NeurIPS.

[57] Anima Anandkumar,et al. Active Learning with Partial Feedback , 2018, ICLR.

[58] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[59] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[60] Silvio Savarese,et al. Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[61] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[62] Jens Lehmann,et al. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[63] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[64] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[65] Burr Settles,et al. From Theories to Queries: Active Learning in Practice , 2011 .

[66] James Bailey,et al. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[67] Dan Roth,et al. Learning Question Classifiers , 2002, COLING.

[68] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .

[69] Bernhard Schölkopf,et al. Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[70] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.