The Promise and Peril of Human Evaluation for Model Interpretability

Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.

[1]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[2]  Robert R. Hoffman,et al.  A survey of methods for eliciting the knowledge of experts , 1989, SGAR.

[3]  Arend Sidow,et al.  Parsimony or statistics? , 1994, Nature.

[4]  Maya R. Gupta,et al.  Satisfying Real-world Goals with Dataset Constraints , 2016, NIPS.

[5]  C. Hwang Multiple Objective Decision Making - Methods and Applications: A State-of-the-Art Survey , 1979 .

[6]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[7]  E. Wagenmakers,et al.  Model Comparison and the Principle of Parsimony , 2015 .

[8]  Philip Brey,et al.  Freedom and Privacy in Ambient Intelligence , 2005, Ethics and Information Technology.

[9]  Malcolm R. Forster,et al.  How to Tell When Simpler, More Unified, or Less Ad Hoc Theories will Provide More Accurate Predictions , 1994, The British Journal for the Philosophy of Science.

[10]  Malcolm R. Forster Unification and Scientific Realism Revisited , 1986, PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association.

[11]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[12]  Hussein A. Abbass,et al.  A multi-disciplinary review of knowledge acquisition methods: From human to autonomous eliciting agents , 2016, Knowl. Based Syst..

[13]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[14]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[15]  Matthias Ehrgott,et al.  Multicriteria Optimization (2. ed.) , 2005 .

[16]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[17]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.