Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals

Spurious correlations threaten the validity of statistical classifiers. While model accuracy may appear high when the test data is from the same distribution as the training data, it can quickly degrade when the test distribution changes. For example, it has been shown that classifiers perform poorly when humans make minor modifications to change the label of an example. One solution to increase model reliability and generalizability is to identify causal associations between features and classes. In this paper, we propose to train a robust text classifier by augmenting the training data with automatically generated counterfactual data. We first identify likely causal features using a statistical matching approach. Next, we generate counterfactual samples for the original training data by substituting causal features with their antonyms and then assigning opposite labels to the counterfactual samples. Finally, we combine the original data and counterfactual data to train a robust classifier. Experiments on two classification tasks show that a traditional classifier trained on the original data does very poorly on human-generated counterfactual samples (e.g., 10%-37% drop in accuracy). However, the classifier trained on the combined data is more robust and performs well on both the original test data and the counterfactual test data (e.g., 12%-25% increase in accuracy compared with the traditional classifier). Detailed analysis shows that the robust classifier makes meaningful and trustworthy predictions by emphasizing causal features and de-emphasizing non-causal features.

[1]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[2]  Christopher Winship,et al.  THE ESTIMATION OF CAUSAL EFFECTS FROM OBSERVATIONAL DATA , 1999 .

[3]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[4]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[5]  A. Heinrichs Ups and downs. , 2001, Trends in molecular medicine.

[6]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[7]  Saif Mohammad,et al.  Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems , 2018, *SEMEVAL.

[8]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[9]  Ryan Cotterell,et al.  Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology , 2019, ACL.

[10]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Ankur Taly,et al.  Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[13]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[14]  Mark Dredze,et al.  Challenges of Using Text Classifiers for Causal Inference , 2018, EMNLP.

[15]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[16]  Zhao Wang,et al.  Identifying spurious correlations for robust text classification , 2020, FINDINGS.

[17]  Michael J. Paul,et al.  Feature Selection as Causal Inference: Experiments with Text Classification , 2017, CoNLL.

[18]  Yufeng Li,et al.  A Backdoor Attack Against LSTM-Based Text Classification Systems , 2019, IEEE Access.

[19]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[20]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[21]  Katherine A. Keith,et al.  Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates , 2020, ACL.

[22]  Richard A. Nielsen,et al.  Why Propensity Scores Should Not Be Used for Matching , 2019, Political Analysis.

[23]  Saharon Rosset,et al.  Leakage in data mining: formulation, detection, and avoidance , 2011, TKDD.

[24]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[25]  Virgile Landeiro,et al.  Robust Text Classification under Confounding Shift , 2018, J. Artif. Intell. Res..

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Percy Liang,et al.  An Investigation of Why Overparameterization Exacerbates Spurious Correlations , 2020, ICML.

[28]  Percy Liang,et al.  Robustness to Spurious Correlations via Human Annotations , 2020, ICML.

[29]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.

[30]  Aditi Raghunathan,et al.  Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.

[31]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.