SPECTRA: Sparse Structured Text Rationalization

Selective rationalization aims to produce decisions along with rationales (e.g., text highlights or word alignments between two sentences). Commonly, rationales are modeled as stochastic binary masks, requiring samplingbased gradient estimators, which complicates training and requires careful hyperparameter tuning. Sparse attention mechanisms are a deterministic alternative, but they lack a way to regularize the rationale extraction (e.g., to control the sparsity of a text highlight or the number of alignments). In this paper, we present a unified framework for deterministic extraction of structured explanations via constrained inference on a factor graph, forming a differentiable layer. Our approach greatly eases training and rationale regularization, generally outperforming previous work on what comes to performance and plausibility of the extracted rationales. We further provide a comparative study of stochastic and deterministic methods for rationale extraction for classification and natural language inference tasks, jointly assessing their predictive power, quality of the explanations, and model variability.

[1]  Zijian Zhang,et al.  Explain and Predict, and then Predict Again , 2021, WSDM.

[2]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[3]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[6]  André F. T. Martins,et al.  Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning , 2013, ACL.

[7]  Ryan T. McDonald Discriminative Sentence Compression with Soft Syntactic Evidence , 2006, EACL.

[8]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Francesco Romani,et al.  Ranking a stream of news , 2005, WWW '05.

[11]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  Vlad Niculae,et al.  LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction , 2020, ICML.

[14]  Peter Tiňo,et al.  A Survey on Neural Network Interpretability , 2020, IEEE Transactions on Emerging Topics in Computational Intelligence.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[17]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[18]  Lili Yu,et al.  Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport , 2020, ACL.

[19]  Ivan Titov,et al.  Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming , 2019, ACL.

[20]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[21]  Tommi S. Jaakkola,et al.  Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.

[22]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[23]  Hannaneh Hajishirzi,et al.  An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction , 2020, EMNLP.

[24]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[25]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[26]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[27]  Dan Klein,et al.  Jointly Learning to Extract and Compress , 2011, ACL.

[28]  Ana Marasovi'c,et al.  Teach Me to Explain: A Review of Datasets for Explainable NLP , 2021, ArXiv.

[29]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[30]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[31]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[32]  Ivan Titov,et al.  Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder , 2018, ICLR.

[33]  Alexander M. Rush,et al.  Dual Decomposition for Parsing with Non-Projective Head Automata , 2010, EMNLP.

[34]  Jure Leskovec,et al.  Learning Attitudes and Attributes from Multi-aspect Reviews , 2012, 2012 IEEE 12th International Conference on Data Mining.

[35]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[36]  Vlad Niculae,et al.  A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[37]  Eric P. Xing,et al.  AD3: alternating directions dual decomposition for MAP inference in graphical models , 2015, J. Mach. Learn. Res..

[38]  Yoav Goldberg,et al.  Aligning Faithful Interpretations with their Social Attribution , 2020, ArXiv.

[39]  Regina Barzilay,et al.  Deriving Machine Attention from Human Rationales , 2018, EMNLP.

[40]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[41]  André F. T. Martins,et al.  The Explanation Game: Towards Prediction Explainability through Sparse Communication , 2020, BLACKBOXNLP.

[42]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[43]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[44]  Datasets , 2021, Algebraic Analysis of Social Networks.

[45]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[46]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[47]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[48]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[49]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[50]  Claire Cardie,et al.  SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.