论文信息 - Optimal Piecewise Approximations for Model Interpretation

Optimal Piecewise Approximations for Model Interpretation

Recent literature interprets the predictions of "black-box" machine learning models (Neural Networks, Random Forests, etc.) by approximating these models in terms of simpler models such as piecewise linear or piecewise constant models. Existing literature does not provide guarantees on whether these approximations reflect the nature of the predictive model well, which can result in misleading interpretations. We provide a tractable dynamic programming algorithm that partitions the feature space into subsets and assigns a local model (constant/linear model) to provide piecewise constant/piecewise linear interpretations of an arbitrary predictive model. When approximation loss (between the interpretation and the predictive model) is measured in terms of mean squared error, our approximation is optimal; for more general loss functions, our interpretation is approximately optimal. Therefore, in both cases it probably approximately correctly (PAC) learns the predictive model. Experiments with real and synthetic data show that it provides significant improvements (in terms of mean squared error) over competing approaches. We also show real use cases to establish the utility of the proposed approach over competing approaches.

[1] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[2] Le Song,et al. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[3] Seth Flaxman,et al. European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[4] Johannes Gehrke,et al. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[5] S. Weisberg,et al. Residuals and Influence in Regression , 1982 .

[6] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[7] W. Zame,et al. Optimal Piecewise Local-Linear Approximations , 2018 .

[8] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[9] Karl Swedberg,et al. Predicting survival in heart failure: a risk score based on 39 372 patients from 30 studies. , 2013, European heart journal.

[10] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[11] Osbert Bastani,et al. Interpreting Blackbox Models via Model Extraction , 2017, ArXiv.