论文信息 - On the Interaction of Belief Bias and Explanations

On the Interaction of Belief Bias and Explanations

A myriad of explainability methods have been proposed in recent years, but there is little consensus on how to evaluate them. While automatic metrics allow for quick benchmarking, it isn’t clear how such metrics reflect human interaction with explanations. Human evaluation is of paramount importance, but previous protocols fail to account for belief biases affecting human performance, which may lead to misleading conclusions. We provide an overview of belief bias, its role in human evaluation, and ideas for NLP practitioners on how to account for it. For two experimental paradigms, we present a case study of gradientbased explainability introducing simple ways to account for humans’ prior beliefs: models of varying quality and adversarial examples. We show that conclusions about the highest performing methods change when introducing such controls, pointing to the importance of accounting for belief bias in evaluation.

Anders Sogaard | Anna Rogers | Ana Valeria Gonzalez | Anders Søgaard | Anna Rogers

[1] V. Goel,et al. Negative emotions can attenuate the influence of beliefs on logical reasoning , 2011, Cognition & emotion.

[2] J. Tukey. Comparing individual means in the analysis of variance. , 1949, Biometrics.

[3] Byron C. Wallace,et al. ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[4] Jonathan Evans,et al. Necessity, Possibility and Belief: A Study of Syllogistic Reasoning , 2001, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[5] Carlos Eduardo Scheidegger,et al. Assessing the Local Interpretability of Machine Learning Models , 2019, ArXiv.

[6] Jonathan Evans,et al. Rapid responding increases belief bias: Evidence for the dual-process theory of reasoning , 2005 .

[7] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[8] Jonathan Evans,et al. The source of belief bias effects in syllogistic reasoning , 1992, Cognition.

[9] K. Jellinger,et al. In two minds: Dual processes and beyond. , 2009 .

[10] K. Stanovich,et al. On the relative independence of thinking biases and cognitive ability. , 2008, Journal of personality and social psychology.

[11] A. Tversky,et al. Preference and belief: Ambiguity and competence in choice under uncertainty , 1991 .

[12] Anders Sogaard,et al. Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias , 2020, EMNLP.

[13] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[14] Jakob Grue Simonsen,et al. A Diagnostic Study of Explainability Techniques for Text Classification , 2020, EMNLP.

[15] Jonathan Evans. Dual-processing accounts of reasoning, judgment, and social cognition. , 2008, Annual review of psychology.

[16] P. Pollard,et al. On the conflict between logic and belief in syllogistic reasoning , 1983, Memory & cognition.

[17] Pat Croskerry,et al. A universal model of diagnostic reasoning. , 2009, Academic medicine : journal of the Association of American Medical Colleges.

[18] Mark Urban-Lurain,et al. Quantifying cognitive bias in educational researchers , 2020 .

[19] Daniel G. Goldstein,et al. Manipulating and Measuring Model Interpretability , 2018, CHI.

[20] Eunsol Choi,et al. QED: A Framework and Dataset for Explanations in Question Answering , 2020, Transactions of the Association for Computational Linguistics.

[21] Marko Bohanec,et al. Perturbation-Based Explanations of Prediction Models , 2018, Human and Machine Learning.

[22] D. Kahneman. A perspective on judgment and choice: mapping bounded rationality. , 2003, The American psychologist.

[23] Yi Chern Tan,et al. Assessing Social and Intersectional Biases in Contextualized Word Representations , 2019, NeurIPS.

[24] A. Acquisti,et al. Reputation as a sufficient condition for data quality on Amazon Mechanical Turk , 2013, Behavior Research Methods.

[25] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[26] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[27] Francesca Toni,et al. Human-grounded Evaluations of Explanation Methods for Text Classification , 2019, EMNLP.

[28] Alan W Black,et al. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings , 2019, NAACL.

[29] Eric Horvitz,et al. Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff , 2019, AAAI.