Identifying spurious correlations for robust text classification

The predictions of text classifiers are often driven by spurious correlations -- e.g., the term `Spielberg' correlates with positively reviewed movies, even though the term itself does not semantically convey a positive sentiment. In this paper, we propose a method to distinguish spurious and genuine correlations in text classification. We treat this as a supervised classification problem, using features derived from treatment effect estimators to distinguish spurious correlations from "genuine" ones. Due to the generic nature of these features and their small dimensionality, we find that the approach works well even with limited training examples, and that it is possible to transport the word classifier to new domains. Experiments on four datasets (sentiment classification and toxicity detection) suggest that using this approach to inform feature selection also leads to more robust classification, as measured by improved worst-case accuracy on the samples affected by spurious correlations.

[1]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[2]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[3]  Percy Liang,et al.  An Investigation of Why Overparameterization Exacerbates Spurious Correlations , 2020, ICML.

[4]  Michael J. Paul,et al.  Feature Selection as Causal Inference: Experiments with Text Classification , 2017, CoNLL.

[5]  Christopher Winship,et al.  THE ESTIMATION OF CAUSAL EFFECTS FROM OBSERVATIONAL DATA , 1999 .

[6]  Erez Shmueli,et al.  Algorithmic Fairness , 2020, ArXiv.

[7]  Mark Dredze,et al.  Challenges of Using Text Classifiers for Causal Inference , 2018, EMNLP.

[8]  Yulia Tsvetkov,et al.  Topics to Avoid: Demoting Latent Confounds in Text Classification , 2019, EMNLP.

[9]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[10]  Hema Raghavan,et al.  InterActive Feature Selection , 2005, IJCAI.

[11]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[12]  Yoav Goldberg,et al.  Adversarial Removal of Demographic Attributes from Text Data , 2018, EMNLP.

[13]  Burton Richter,et al.  Ups and downs , 1997 .

[14]  Virgile Landeiro,et al.  Robust Text Classification under Confounding Shift , 2018, J. Artif. Intell. Res..

[15]  Katherine A. Keith,et al.  Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates , 2020, ACL.

[16]  Percy Liang,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[17]  Gary King,et al.  Comparative Effectiveness of Matching Methods for Causal Inference , 2011 .

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[20]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[21]  Manali Sharma,et al.  Active Learning with Rationales for Text Classification , 2015, NAACL.

[22]  Ankur Taly,et al.  Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[23]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[24]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.

[25]  J. Aldrich Correlations Genuine and Spurious in Pearson and Yule , 1995 .

[26]  Daniel Jurafsky,et al.  Deconfounded Lexicon Induction for Interpretable Social Science , 2018, NAACL.

[27]  Tejashri Inadarchand Jain,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .

[28]  Richard A. Nielsen,et al.  Why Propensity Scores Should Not Be Used for Matching , 2019, Political Analysis.

[29]  Yufeng Li,et al.  A Backdoor Attack Against LSTM-Based Text Classification Systems , 2019, IEEE Access.

[30]  A. Heinrichs Ups and downs. , 2001, Trends in molecular medicine.

[31]  Euclid,et al.  Statistical science : a review journal of the Institute of Mathematical Statistics. , 1986 .

[32]  G. Imbens Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review , 2004 .

[33]  Aron Culotta,et al.  Characterizing Variation in Toxic Language by Social Context , 2020, ICWSM.

[34]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[35]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[36]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.