Making a (Counterfactual) Difference One Rationale at a Time

Rationales, snippets of extracted text that explain an inference, have emerged as a popular framework for interpretable natural language processing (NLP). Rationale models typically consist of two cooperating modules: a selector and a classifier with the goal of maximizing the mutual information (MMI) between the "selected" text and the document label. Despite their promises, MMI-based methods often pick up on spurious text patterns and result in models with nonsensical behaviors. In this work, we investigate whether counterfactual data augmentation (CDA), without human assistance, can improve the performance of the selector by lowering the mutual information between spurious signals and the document label. Our counterfactuals are produced in an unsupervised fashion using class-dependent generative models. From an information theoretic lens, we derive properties of the unaugmented dataset for which our CDA approach would succeed. The effectiveness of CDA is empirically evaluated by comparing against several baselines including an improved MMI-based rationale schema [19] on two multi-aspect datasets. Our results show that CDA produces rationales that better capture the signal of interest.

[1]  Tao Zhang,et al.  Mask and Infill: Applying Masked Language Model for Sentiment Transfer , 2019, IJCAI.

[2]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[3]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[5]  Andreas Geiger,et al.  Counterfactual Generative Networks , 2021, ICLR.

[6]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[7]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[8]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[9]  Eduard Hovy,et al.  Explaining The Efficacy of Counterfactually-Augmented Data , 2020, ArXiv.

[10]  Jure Leskovec,et al.  Learning Attitudes and Attributes from Multi-aspect Reviews , 2012, 2012 IEEE 12th International Conference on Data Mining.

[11]  Yiming Yang,et al.  Politeness Transfer: A Tag and Generate Approach , 2020, ACL.

[12]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[13]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14]  Yin Zhang,et al.  Counterfactual Generator: A Weakly-Supervised Method for Named Entity Recognition , 2020, EMNLP.

[15]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[16]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[19]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[20]  Yoav Goldberg,et al.  Aligning Faithful Interpretations with their Social Attribution , 2020, ArXiv.

[21]  Regina Barzilay,et al.  Deriving Machine Attention from Human Rationales , 2018, EMNLP.

[22]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[25]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[26]  Tommi S. Jaakkola,et al.  Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.

[27]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[30]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[31]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[32]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.