On the Interaction of Belief Bias and Explanations

A myriad of explainability methods have been proposed in recent years, but there is little consensus on how to evaluate them. While automatic metrics allow for quick benchmarking, it isn’t clear how such metrics reflect human interaction with explanations. Human evaluation is of paramount importance, but previous protocols fail to account for belief biases affecting human performance, which may lead to misleading conclusions. We provide an overview of belief bias, its role in human evaluation, and ideas for NLP practitioners on how to account for it. For two experimental paradigms, we present a case study of gradientbased explainability introducing simple ways to account for humans’ prior beliefs: models of varying quality and adversarial examples. We show that conclusions about the highest performing methods change when introducing such controls, pointing to the importance of accounting for belief bias in evaluation.

[1]  V. Goel,et al.  Negative emotions can attenuate the influence of beliefs on logical reasoning , 2011, Cognition & emotion.

[2]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[3]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[4]  Jonathan Evans,et al.  Necessity, Possibility and Belief: A Study of Syllogistic Reasoning , 2001, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[5]  Carlos Eduardo Scheidegger,et al.  Assessing the Local Interpretability of Machine Learning Models , 2019, ArXiv.

[6]  Jonathan Evans,et al.  Rapid responding increases belief bias: Evidence for the dual-process theory of reasoning , 2005 .

[7]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[8]  Jonathan Evans,et al.  The source of belief bias effects in syllogistic reasoning , 1992, Cognition.

[9]  K. Jellinger,et al.  In two minds: Dual processes and beyond. , 2009 .

[10]  K. Stanovich,et al.  On the relative independence of thinking biases and cognitive ability. , 2008, Journal of personality and social psychology.

[11]  A. Tversky,et al.  Preference and belief: Ambiguity and competence in choice under uncertainty , 1991 .

[12]  Anders Sogaard,et al.  Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias , 2020, EMNLP.

[13]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[14]  Jakob Grue Simonsen,et al.  A Diagnostic Study of Explainability Techniques for Text Classification , 2020, EMNLP.

[15]  Jonathan Evans Dual-processing accounts of reasoning, judgment, and social cognition. , 2008, Annual review of psychology.

[16]  P. Pollard,et al.  On the conflict between logic and belief in syllogistic reasoning , 1983, Memory & cognition.

[17]  Pat Croskerry,et al.  A universal model of diagnostic reasoning. , 2009, Academic medicine : journal of the Association of American Medical Colleges.

[18]  Mark Urban-Lurain,et al.  Quantifying cognitive bias in educational researchers , 2020 .

[19]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[20]  Eunsol Choi,et al.  QED: A Framework and Dataset for Explanations in Question Answering , 2020, Transactions of the Association for Computational Linguistics.

[21]  Marko Bohanec,et al.  Perturbation-Based Explanations of Prediction Models , 2018, Human and Machine Learning.

[22]  D. Kahneman A perspective on judgment and choice: mapping bounded rationality. , 2003, The American psychologist.

[23]  Yi Chern Tan,et al.  Assessing Social and Intersectional Biases in Contextualized Word Representations , 2019, NeurIPS.

[24]  A. Acquisti,et al.  Reputation as a sufficient condition for data quality on Amazon Mechanical Turk , 2013, Behavior Research Methods.

[25]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[26]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[27]  Francesca Toni,et al.  Human-grounded Evaluations of Explanation Methods for Text Classification , 2019, EMNLP.

[28]  Alan W Black,et al.  Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings , 2019, NAACL.

[29]  Eric Horvitz,et al.  Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff , 2019, AAAI.

[30]  Qun Liu,et al.  TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.

[31]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[32]  Richard Anderson,et al.  Belief bias in the perception of sample size adequacy , 2014, CogSci.

[33]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[34]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[35]  Chandler May,et al.  On Measuring Social Biases in Sentence Encoders , 2019, NAACL.

[36]  S. Handley,et al.  The parallel processing model of belief bias: review and extensions , 2017 .

[37]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[38]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[39]  H. Markovits,et al.  The belief-bias effect in the production and evaluation of logical conclusions , 1989, Memory & cognition.

[40]  A. Furnham,et al.  A literature review of the anchoring effect , 2011 .

[41]  Qian Yang,et al.  Designing Theory-Driven User-Centric Explainable AI , 2019, CHI.

[42]  Alex Endert,et al.  Warning, Bias May Occur: A Proposed Approach to Detecting Cognitive Bias in Interactive Visual Analytics , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[43]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[44]  Dong Nguyen,et al.  Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.

[45]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[46]  Mohit Bansal,et al.  Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? , 2020, ACL.

[47]  Carlos Guestrin,et al.  Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.

[48]  K C Klauer,et al.  On belief bias in syllogistic reasoning. , 2000, Psychological review.

[49]  M. Franchella,et al.  How to Get Rid of the Belief Bias: Boosting Analytical Thinking via Pragmatics , 2019, Europe's journal of psychology.

[50]  P. Klaczynski,et al.  Goal-oriented critical reasoning and individual differences in critical reasoning biases. , 1997 .

[51]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[52]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[53]  Julie Linda Barston AN INVESTIGATION INTO BELIEF BIASES IN REASONING , 1986 .

[54]  Chris Russell,et al.  Explaining Explanations in AI , 2018, FAT.