Fixing Model Bugs with Natural Language Patches

Current approaches for fixing systematic problems in NLP models (e.g., regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts. In contrast, humans often provide corrections to each other through natural language. Taking inspiration from this, we explore natural language patches—declarative statements that allow developers to provide corrective feedback at the right level of abstraction, either overriding the model (“if a review gives 2 stars, the sentiment is negative”) or providing additional information the model may lack (“if something is described as the bomb, then it is good”). We model the task of determining if a patch applies separately from the task of integrating patch information, and show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data—1 to 7 patches improve accuracy by ~1–4 accuracy points on different slices of a sentiment analysis dataset, and F1 by 7 points on a relation extraction dataset. Finally, we show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches.

[1]  Ewa Dąbrowska,et al.  Beyond accuracy , 2024, Journal of Monolingual and Bilingual Speech.

[2]  Noah D. Goodman,et al.  Improving Intrinsic Exploration with Language Abstractions , 2022, NeurIPS.

[3]  Christopher D. Manning,et al.  Fast Model Editing at Scale , 2021, ICLR.

[4]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[5]  Nicola De Cao,et al.  Editing Factual Knowledge in Language Models , 2021, EMNLP.

[6]  Danqi Chen,et al.  Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.

[7]  Eric Jang,et al.  Meta-Learning Requires Meta-Augmentation , 2020, NeurIPS.

[8]  Oyvind Tafjord,et al.  Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge , 2020, NeurIPS.

[9]  Tom B. Brown,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[10]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[11]  Percy Liang,et al.  ExpBERT: Representation Engineering with Natural Language Explanations , 2020, ACL.

[12]  Noah D. Goodman,et al.  Shaping Visual Representations with Language for Few-Shot Classification , 2019, ACL.

[13]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[14]  Teven Le Scao,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[15]  Maosong Sun,et al.  FewRel 2.0: Towards More Challenging Few-Shot Relation Classification , 2019, EMNLP.

[16]  Prasoon Goyal,et al.  Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[17]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[18]  John DeNero,et al.  Guiding Policies with Language via Meta-Learning , 2018, ICLR.

[19]  Tom M. Mitchell,et al.  Zero-shot Learning of Classifiers from Natural Language Quantification , 2018, ACL.

[20]  Christopher Ré,et al.  Training Classifiers with Natural Language Explanations , 2018, ACL.

[21]  Dan Klein,et al.  Learning with Latent Language , 2017, NAACL.

[22]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[23]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[24]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[25]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[26]  Ian S. Dunn,et al.  Exploring the Limits , 2009 .

[27]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[28]  Austin W. Hanjie,et al.  Semantic Supervision: Enabling Generalization over Output Spaces , 2022, ArXiv.

[29]  David Bau,et al.  Locating and Editing Factual Knowledge in GPT , 2022, ArXiv.

[30]  D. Klein,et al.  Meta-tuning Language Models to Answer Prompts Better , 2021, ArXiv.

[31]  Eduardo Blanco,et al.  Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , 2018, EMNLP 2018.