Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

Large Language Models have demonstrated remarkable few-shot performance, but the performance can be sensitive to the selection of few-shot instances. We propose PATRON, a new method that uses prompt-based uncertainty estimation for data selection for pre-trained language model fine-tuning under cold-start scenarios, i.e., no initial labeled data are available. In PATRON, we design (1) a prompt-based uncertainty propagation approach to estimate the importance of data points and (2) a partition-then-rewrite (PTR) strategy to promote sample diversity when querying for annotations. Experiments on six text classification datasets show that PATRON outperforms the strongest cold-start data selection baselines by up to 6.9%. Besides, with 128 labels only, PATRON achieves 91.0% and 92.1% of the fully supervised performance based on vanilla fine-tuning and prompt-based learning respectively. Our implementation of PATRON is available at \url{https://github.com/yueyu1030/Patron}.

[1]  Carl Yang,et al.  Neighborhood-Regularized Self-Training for Learning with Few Labels , 2023, AAAI.

[2]  Maosong Sun,et al.  Prompt Tuning for Discriminative Pre-trained Language Models , 2022, FINDINGS.

[3]  Rehan Ahmad,et al.  Unsupervised Data Selection for Speech Recognition with Contrastive Loss Ratios , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Thomas Müller,et al.  Active Few-Shot Learning with FASL , 2022, NLDB.

[5]  R. Aharonov,et al.  Cluster & Tune: Boost Cold Start Performance in Text Classification , 2022, ACL.

[6]  Yue Yu,et al.  Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning , 2022, ACL.

[7]  Alexander J. Ratner,et al.  A Survey on Programmatic Weak Supervision , 2022, ArXiv.

[8]  D. Weinshall,et al.  Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets , 2022, ICML.

[9]  Mengjie Zhao,et al.  LMTurk: Few-Shot Learners as Crowdsourcing Workers , 2021, NAACL-HLT.

[10]  Maosong Sun,et al.  OpenPrompt: An Open-source Framework for Prompt-learning , 2021, ACL.

[11]  Fei Huang,et al.  Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners , 2021, ICLR.

[12]  Luke Zettlemoyer,et al.  Noisy Channel Language Model Prompting for Few-Shot Text Classification , 2021, ACL.

[13]  Maosong Sun,et al.  Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification , 2021, ACL.

[14]  Martin Potthast,et al.  Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers , 2021, FINDINGS.

[15]  Loïc Barrault,et al.  On the Importance of Effectively Adapting Pretrained Language Models for Active Learning , 2021, ACL.

[16]  Benjamin Van Durme,et al.  Adapting Coreference Resolution Models through Active Learning , 2021, ACL.

[17]  Barbara Plank,et al.  Cartography Active Learning , 2021, Conference on Empirical Methods in Natural Language Processing.

[18]  Nikolaos Aletras,et al.  Active Learning by Acquiring Contrastive Examples , 2021, EMNLP.

[19]  Kyle Lo,et al.  FLEX: Unifying Evaluation for Few-Shot NLP , 2021, NeurIPS.

[20]  Ruoxi Jia,et al.  Zero-Round Active Learning , 2021, ArXiv.

[21]  Vera Demberg,et al.  On Training Instance Selection for Few-Shot Neural Text Generation , 2021, ACL.

[22]  Li Fei-Fei,et al.  Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering , 2021, ACL.

[23]  Jiawei Han,et al.  Training ELECTRA Augmented with Multi-word Selection , 2021, FINDINGS.

[24]  Danqi Chen,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[25]  Colin Raffel,et al.  Improving and Simplifying Pattern Exploiting Training , 2021, EMNLP.

[26]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[27]  Alexander Panchenko,et al.  Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates , 2021, EACL.

[28]  Danqi Chen,et al.  Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.

[29]  T. Zhao,et al.  Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach , 2020, NAACL.

[30]  Vishrav Chaudhary,et al.  Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[31]  Hinrich Schütze,et al.  It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[32]  Kilian Q. Weinberger,et al.  Revisiting Few-sample BERT Fine-tuning , 2020, ICLR.

[33]  Timo Schick,et al.  Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[34]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[35]  Stella X. Yu,et al.  Unsupervised Data Selection for Data-Centric Semi-Supervised Learning , 2021 .

[36]  Jianfeng Gao,et al.  LiST: Lite Self-training Makes Efficient Few-shot Learners , 2021, ArXiv.

[37]  Akshay L Chandra,et al.  On Initial Pools for Deep Active Learning , 2020, Preregister@NeurIPS.

[38]  Eyal Shnarch,et al.  Active Learning for BERT: An Empirical Study , 2020, EMNLP.

[39]  Yiming Yang,et al.  On the Sentence Embeddings from BERT for Semantic Textual Similarity , 2020, EMNLP.

[40]  Hsuan-Tien Lin,et al.  Cold-start Active Learning through Self-Supervised Language Modeling , 2020, EMNLP.

[41]  Chao Zhang,et al.  Weakly-Supervised Text Classification Using Label Names Only , 2020, EMNLP.

[42]  Chao Zhang,et al.  SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup , 2020, EMNLP.

[43]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[44]  Weinan Zhang,et al.  Active Sentence Learning by Adversarial Uncertainty Sampling in Discrete Space , 2020, FINDINGS.

[45]  Noah A. Smith,et al.  Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[46]  Roee Aharoni,et al.  Unsupervised Domain Clusters in Pretrained Language Models , 2020, ACL.

[47]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[48]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[49]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[50]  Zachary Chase Lipton,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.

[51]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[52]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[53]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[54]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[55]  Jiawei Han,et al.  Weakly-Supervised Hierarchical Text Classification , 2018, AAAI.

[56]  Xinyun Chen,et al.  Learning to Perform Local Rewriting for Combinatorial Optimization , 2018, NeurIPS.

[57]  Anima Anandkumar,et al.  Active Learning with Partial Feedback , 2018, ICLR.

[58]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[59]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[60]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[61]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[62]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[63]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[64]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[65]  Burr Settles,et al.  From Theories to Queries: Active Learning in Practice , 2011 .

[66]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[67]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[68]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[69]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[70]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.