Polyjuice: Automated, General-purpose Counterfactual Generation

Counterfactual examples have been shown to be useful for many applications, including calibrating, evaluating, and explaining model decision boundaries. However, previous methods for generating such counterfactual examples have been tightly tailored to a specific application, used a limited range of linguistic patterns, or are hard to scale. We propose to disentangle counterfactual generation from its use cases, i.e., gather general-purpose counterfactuals first, and then select them for specific applications. We frame the automated counterfactual generation as text generation, and finetune GPT-2 into a generator, Polyjuice, which produces fluent and diverse counterfactuals. Our method also allows control over where perturbations happen and what they do. We show Polyjuice supports multiple use cases: by generating diverse counterfactuals for humans to label, Polyjuice helps produce highquality datasets for model training and evaluation, requiring 40% less human effort. When used to generate explanations, Polyjuice helps augment feature attribution methods to reveal models’ erroneous behaviors.

[1]  Counterfactual Explanation Based on Gradual Construction for Deep Networks , 2020, ArXiv.

[2]  Matthew E. Peters,et al.  Explaining NLP Models via Minimal Contrastive Editing (MiCE) , 2020, FINDINGS.

[3]  Nishtha Madaan,et al.  Generate Your Counterfactuals: Towards Controlled Counterfactual Generation for Text , 2020, AAAI.

[4]  Xinwei Yu,et al.  Universal Adversarial Attacks with Natural Triggers for Text Classification , 2020, NAACL.

[5]  Chuanrong Li,et al.  Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets , 2020, BLACKBOXNLP.

[6]  Samuel R. Bowman,et al.  Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data , 2020, INSIGHTS.

[7]  Yu Wang,et al.  How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers? , 2020, FINDINGS.

[8]  Chris Donahue,et al.  Enabling Language Models to Fill in the Blanks , 2020, ACL.

[9]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[10]  Koustuv Sinha,et al.  Probing Linguistic Systematicity , 2020, ACL.

[11]  Mohit Bansal,et al.  Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? , 2020, ACL.

[12]  John X. Morris,et al.  TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP , 2020, EMNLP.

[13]  Daniel Khashabi,et al.  More Bang for Your Buck: Natural Perturbation for Robust Question Answering , 2020, EMNLP.

[14]  Noah A. Smith,et al.  Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[15]  Siddhant Garg,et al.  BAE: BERT-based Adversarial Examples for Text Classification , 2020, EMNLP.

[16]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[17]  Eunah Cho,et al.  Data Augmentation using Pre-trained Transformer Models , 2020, LIFELONGNLP.

[18]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[19]  Zachary Chase Lipton,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.

[20]  J. Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.

[21]  Joey Tianyi Zhou,et al.  Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2019, AAAI.

[22]  Ronan Le Bras,et al.  WinoGrande , 2019, AAAI.

[23]  Theodoros Evgeniou,et al.  A comparison of instance-level counterfactual explanation algorithms for behavioral and textual data: SEDC, LIME-C and SHAP-C , 2019, Advances in Data Analysis and Classification.

[24]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[25]  Kentaro Inui,et al.  When Choosing Plausible Alternatives, Clever Hans can be Clever , 2019, EMNLP.

[26]  Christopher Potts,et al.  Posing Fair Generalization Tasks for Natural Language Inference , 2019, EMNLP.

[27]  Angeliki Metallinou,et al.  Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents , 2019, EMNLP.

[28]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[29]  Christopher Ré,et al.  Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices , 2019, NeurIPS.

[30]  Aditi Raghunathan,et al.  Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.

[31]  Jeffrey Heer,et al.  Errudite: Scalable, Reproducible, and Testable Error Analysis , 2019, ACL.

[32]  Lei Li,et al.  Generating Fluent Adversarial Examples for Natural Languages , 2019, ACL.

[33]  Mohit Bansal,et al.  Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA , 2019, ACL.

[34]  Roy Schwartz,et al.  Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets , 2019, NAACL.

[35]  Alex Wang,et al.  Probing What Different NLP Tasks Teach Machines about Function Word Comprehension , 2019, *SEMEVAL.

[36]  Jason Baldridge,et al.  PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[37]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[38]  Xing Wu,et al.  Conditional BERT Contextual Augmentation , 2018, ICCS.

[39]  Mohit Bansal,et al.  Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.

[40]  Ankur Taly,et al.  Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[41]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[44]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[45]  Carolyn Penstein Rosé,et al.  Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[46]  Eric P. Xing,et al.  Unsupervised Text Style Transfer using Language Models as Discriminators , 2018, NeurIPS.

[47]  Yoav Goldberg,et al.  Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[48]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[49]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[50]  Lei Zheng,et al.  Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[51]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[52]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[53]  J. Pearl Causal and Counterfactual Inference , 2018 .

[54]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[55]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[56]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[57]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[58]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[59]  Nabiha Asghar,et al.  Yelp Dataset Challenge: Review Rating Prediction , 2016, ArXiv.

[60]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[61]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[62]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[63]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[64]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[65]  A. Tversky,et al.  The simulation heuristic , 1982 .