Diagnostic Curves for Black Box Models

In safety-critical applications of machine learning, it is often necessary to look beyond standard metrics such as test accuracy in order to validate various qualitative properties such as monotonicity with respect to a feature or combination of features, checking for undesirable changes or oscillations in the response, and differences in outcomes (e.g. discrimination) for a protected class. To help answer this need, we propose a framework for approximately validating (or invalidating) various properties of a black box model by finding a univariate diagnostic curve in the input space whose output maximally violates a given property. These diagnostic curves show the exact value of the model along the curve and can be displayed with a simple and intuitive line graph. We demonstrate the usefulness of these diagnostic curves across multiple use-cases and datasets including selecting between two models and understanding out-of-sample behavior.

[1]  Barnabás Póczos,et al.  Transformation Autoregressive Networks , 2018, ICML.

[2]  Kush R. Varshney,et al.  Engineering safety in machine learning , 2016, 2016 Information Theory and Applications Workshop (ITA).

[3]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[4]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[5]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[6]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[7]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[8]  Rishabh Singh,et al.  Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections , 2018, NeurIPS.

[9]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[10]  Wenbo Guo,et al.  Explaining Deep Learning Models - A Bayesian Non-parametric Approach , 2018, NeurIPS.

[11]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[12]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[13]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[14]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[15]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[16]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[17]  Emil Pitkin,et al.  Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[18]  Suchi Saria,et al.  Tutorial: Safe and Reliable Machine Learning , 2019, ArXiv.

[19]  Pradeep Ravikumar,et al.  Deep Density Destructors , 2018, ICML.

[20]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[21]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[22]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[23]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[24]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[25]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[26]  Michael J. Best,et al.  Active set algorithms for isotonic regression; A unifying framework , 1990, Math. Program..

[27]  Alexandra Chouldechova,et al.  Fairer and more accurate, but for whom? , 2017, ArXiv.

[28]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[29]  Gillian K. Hadfield,et al.  Regulatory Markets for AI Safety , 2019, ArXiv.

[30]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[31]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[32]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[33]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[34]  Kush R. Varshney,et al.  On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products , 2016, Big Data.

[35]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.