Topics to Avoid: Demoting Latent Confounds in Text Classification

Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with respect to the task of native language identification. We find that standard text classifiers which perform well on the test set end up learning topical features which are confounds of the prediction task (e.g., if the input text mentions Sweden, the classifier predicts that the author’s native language is Swedish). We propose a method that represents the latent topical confounds and a model which “unlearns” confounding features by predicting both the label of the input text and the confound; but we train the two predictors adversarially in an alternating fashion to learn a text representation that predicts the correct label but is less prone to using information about the confound. We show that this model generalizes better and learns features that are indicative of the writing style rather than the content.

[1]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[2]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[3]  S. Granger The International Corpus of Learner English: A New Resource for Foreign Language Learning and Teaching and Second Language Acquisition Research , 2003 .

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Moshe Koppel,et al.  Automatically Determining an Anonymous Author's Native Language , 2005, ISI.

[6]  Ari Rappoport,et al.  Using Classifier Features for Studying the Effect of Native Language on the Choice of Written Second Language Words , 2007 .

[7]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[8]  Burt L. Monroe,et al.  Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict , 2008, Political Analysis.

[9]  Mark Dras,et al.  Contrastive Analysis and Native Language Identification , 2009, ALTA.

[10]  J. O’Neill,et al.  Play it again: reactivation of waking experience and memory , 2010, Trends in Neurosciences.

[11]  Mark Dras,et al.  Exploiting Parse Structures for Native Language Identification , 2011, EMNLP.

[12]  Joel R. Tetreault,et al.  A Report on the First Native Language Identification Shared Task , 2013, BEA@NAACL-HLT.

[13]  Martin Chodorow,et al.  TOEFL11: A CORPUS OF NON‐NATIVE ENGLISH , 2013 .

[14]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  Bo Pang,et al.  The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter , 2014, ACL.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[19]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[20]  Zhe Zhao,et al.  Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations , 2017, ArXiv.

[21]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[22]  Michael J. Paul,et al.  Feature Selection as Causal Inference: Experiments with Text Classification , 2017, CoNLL.

[23]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[24]  Graham Neubig,et al.  Controllable Invariance through Adversarial Feature Learning , 2017, NIPS.

[25]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[26]  Shervin Malmasi,et al.  Native Language Identification using Stacked Generalization , 2017, ArXiv.

[27]  Joel R. Tetreault,et al.  A Report on the 2017 Native Language Identification Shared Task , 2017, BEA@EMNLP.

[28]  Timothy Baldwin,et al.  Towards Robust and Privacy-preserving Text Representations , 2018, ACL.

[29]  Daniel Jurafsky,et al.  Deconfounded Lexicon Induction for Interpretable Social Science , 2018, NAACL.

[30]  Yulia Tsvetkov,et al.  RtGender: A Corpus for Studying Differential Responses to Gender , 2018, LREC.

[31]  Saif Mohammad,et al.  Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems , 2018, *SEMEVAL.

[32]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[33]  Shuly Wintner,et al.  Native Language Identification with User Generated Content , 2018, EMNLP.

[34]  Virgile Landeiro,et al.  Robust Text Classification under Confounding Shift , 2018, J. Artif. Intell. Res..

[35]  Yoav Goldberg,et al.  Adversarial Removal of Demographic Attributes from Text Data , 2018, EMNLP.

[36]  Yulia Tsvetkov,et al.  Native Language Cognate Effects on Second Language Lexical Choice , 2018, TACL.

[37]  Shashi Narayan,et al.  Privacy-preserving Neural Representations of Text , 2018, EMNLP.

[38]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[39]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[40]  Philipp Koehn,et al.  Saliency-driven Word Alignment Interpretation for Neural Machine Translation , 2019, WMT.

[41]  Guillaume Lample,et al.  Multiple-Attribute Text Rewriting , 2018, ICLR.

[42]  Matt Fredrikson,et al.  Feature-Wise Bias Amplification , 2018, ICLR.

[43]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.