Considerations for Evaluation and Generalization in Interpretable Machine Learning

As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is little consensus on what interpretable machine learning is and how it should be measured and evaluated. In this paper, we discuss a definitions of interpretability and describe when interpretability is needed (and when it is not). Finally, we talk about a taxonomy for rigorous evaluation, and recommendations for researchers. We will end with discussing open questions and concrete problems for new researchers.

[1]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[2]  L. Strahilevitz Privacy versus Antidiscrimination , 2007 .

[3]  T. Lombrozo The structure and function of explanations , 2006, Trends in Cognitive Sciences.

[4]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[5]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  Maya R. Gupta,et al.  Monotonic Calibrated Interpolated Look-Up Tables , 2015, J. Mach. Learn. Res..

[8]  Clemens Otte,et al.  Safe and Interpretable Machine Learning: A Methodological Review , 2013 .

[9]  Carlos Guestrin,et al.  Programs as Black-Box Explanations , 2016, ArXiv.

[10]  Dympna O'Sullivan,et al.  The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems , 2015, 2015 International Conference on Healthcare Informatics.

[11]  Seth Flaxman,et al.  EU regulations on algorithmic decision-making and a "right to explanation" , 2016, ArXiv.

[12]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[13]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[14]  Tapio Elomaa,et al.  In Defense of C4.5: Notes in Learning One-Level Decision Trees , 1994, ICML.

[15]  Boris Hayete,et al.  GOTrees: Predicting GO Associations from Protein Domain Composition Using Decision Trees , 2004, Pacific Symposium on Biocomputing.

[16]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[17]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[18]  Samuel J. Gershman,et al.  Compositional Inductive Biases in Function Learning , 2016, bioRxiv.

[19]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[20]  F. Keil,et al.  Explanation and understanding , 2015 .

[21]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[22]  C. Hempel,et al.  Studies in the Logic of Explanation , 1948, Philosophy of Science.

[23]  N. Chater,et al.  Mental Mechanisms: Speculations on Human Causal Learning and Reasoning , 2005 .

[24]  Stefan Rüping,et al.  Learning interpretable models , 2006 .

[25]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[26]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[27]  Michael C. Hughes,et al.  Supervised topic models for clinical interpretability , 2016, 1612.01678.

[28]  Finale Doshi-Velez,et al.  Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction , 2015, NIPS.

[29]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[30]  Cynthia Rudin,et al.  The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[31]  Pedro Antunes,et al.  Structuring dimensions for collaborative systems evaluation , 2012, CSUR.

[32]  Been Kim,et al.  Inferring Robot Task Plans from Human Team Meetings: A Generative Modeling Approach with Logic-Based Prior , 2013, AAAI.

[33]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[34]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[35]  Kush R. Varshney,et al.  On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products , 2016, Big Data.

[36]  Been Kim,et al.  Inferring Team Task Plans from Human Meetings: A Generative Modeling Approach with Logic-Based Prior , 2015, J. Artif. Intell. Res..

[37]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[38]  Candice M. Mills,et al.  What lies beneath? Understanding the limits of understanding , 2004 .

[39]  Klaus-Robert Müller,et al.  PatternNet and PatternLRP - Improving the interpretability of neural networks , 2017, ArXiv.

[40]  S. Glennan Rethinking Mechanistic Explanation , 2002, Philosophy of Science.

[41]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[42]  Mykel J. Kochenderfer,et al.  Next-Generation Airborne Collision Avoidance System , 2012 .

[43]  Finale Doshi-Velez,et al.  Comorbidity Clusters in Autism Spectrum Disorders: An Electronic Health Record Time-Series Analysis , 2014, Pediatrics.

[44]  Weiwei Liu,et al.  Sparse Perceptron Decision Tree for Millions of Dimensions , 2016, AAAI.

[45]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[46]  Cynthia Rudin,et al.  Bayesian Rule Sets for Interpretable Classification , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[47]  Niklas Lavesson,et al.  User-oriented Assessment of Classification Model Understandability , 2011, SCAI.

[48]  Been Kim,et al.  iBCM: Interactive Bayesian Case Model Empowering Humans via Intuitive Interaction , 2015 .

[49]  Hanspeter Pfister,et al.  Automatic Neural Reconstruction from Petavoxel of Electron Microscopy Data , 2016 .

[50]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[51]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[52]  Girish H. Subramanian,et al.  A comparison of the decision table and tree , 1992, CACM.

[53]  Avanti Shrikumar,et al.  Not Just A Black Box : Interpretable Deep Learning by Propagating Activation Differences , 2016 .

[54]  Ryan P. Adams,et al.  Graph-Sparse LDA: A Topic Model with Structured Sparsity , 2014, AAAI.

[55]  Neil T. Heffernan,et al.  AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning , 2016, L@S.

[56]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[57]  W. Bechtel,et al.  Explanation: a mechanist alternative. , 2005, Studies in history and philosophy of biological and biomedical sciences.

[58]  Harry Hochheiser,et al.  Research Methods for Human-Computer Interaction , 2008 .

[59]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[60]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[61]  Andrew Gordon Wilson,et al.  The Human Kernel , 2015, NIPS.

[62]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[63]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[64]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[65]  Helen Nissenbaum,et al.  Adnostic: Privacy Preserving Targeted Advertising , 2010, NDSS.

[66]  Weng-Keen Wong,et al.  Too much, too little, or just right? Ways explanations impact end users' mental models , 2013, 2013 IEEE Symposium on Visual Languages and Human Centric Computing.

[67]  Eliezer Yudkowsky,et al.  The Ethics of Artificial Intelligence , 2014, Artificial Intelligence Safety and Security.

[68]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[69]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[70]  Finale Doshi-Velez,et al.  Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models , 2016, ArXiv.

[71]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[72]  Joe Walsh,et al.  Identifying Police Officers at Risk of Adverse Events , 2016, KDD.

[73]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.