Explainable machine learning in deployment

Explainable machine learning offers the potential to provide stakeholders with insights into model behavior by using various methods such as feature importance scores, counterfactual explanations, or influential training data. Yet there is little understanding of how organizations use these methods in practice. This study explores how organizations view and use explainability for stakeholder consumption. We find that, currently, the majority of deployments are not for end users affected by the model but rather for machine learning engineers, who use explainability to debug the model itself. There is thus a gap between explainability in practice and the goal of transparency, since explanations primarily serve internal stakeholders rather than external ones. Our study synthesizes the limitations of current explainability techniques that hamper their use for end users. To facilitate end user interaction, we develop a framework for establishing clear goals for explainability. We end by discussing concerns raised regarding explainability.

[1]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[2]  Annika Wærn,et al.  Towards Algorithmic Experience: Initial Efforts for Social Media Contexts , 2018, CHI.

[3]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[4]  Alex Pentland,et al.  Fair, Transparent, and Accountable Algorithmic Decision-making Processes , 2018 .

[5]  Alun D. Preece,et al.  Stakeholders in Explainable AI , 2018, ArXiv.

[6]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[7]  R. Dennis Cook,et al.  Detection of Influential Observation in Linear Regression , 2000, Technometrics.

[8]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[9]  William DuMouchel,et al.  Data squashing: constructing summary data sets , 2002 .

[10]  Anca D. Dragan,et al.  Model Reconstruction from Model Explanations , 2018, FAT.

[11]  Geraint Rees,et al.  Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[12]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[13]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[14]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[15]  Eric S. Rosengren,et al.  Board of Governors of the Federal Reserve System , 2002 .

[16]  Vineeth N. Balasubramanian,et al.  Neural Network Attributions: A Causal Perspective , 2019, ICML.

[17]  Susan Athey,et al.  Beyond prediction: Using big data for policy problems , 2017, Science.

[18]  Hang Li,et al.  Word Embedding based Edit Distance , 2018, ArXiv.

[19]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  José M. F. Moura,et al.  Towards Aggregating Weighted Feature Attributions , 2019, ArXiv.

[21]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[22]  Kuangyan Song,et al.  "Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations , 2019 .

[23]  Onora O’Neill,et al.  Linking Trust to Trustworthiness , 2018, From Trust to Trustworthiness.

[24]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[25]  Martin Wattenberg,et al.  The What-If Tool: Interactive Probing of Machine Learning Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[26]  Amit Dhurandhar,et al.  Improving Simple Models with Confidence Profiles , 2018, NeurIPS.

[27]  Judea Pearl Causality by Judea Pearl , 2009 .

[28]  Hyrum S. Anderson,et al.  The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[29]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[30]  Solon Barocas,et al.  The Intuitive Appeal of Explainable Machines , 2018 .

[31]  L. Shapley A Value for n-person Games , 1988 .

[32]  Miroslav Dudík,et al.  Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? , 2018, CHI.

[33]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[34]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[35]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[36]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[37]  Reza Shokri,et al.  On the Privacy Risks of Model Explanations , 2019, AIES.

[38]  Scott M. Lundberg,et al.  Explainable machine-learning predictions for the prevention of hypoxaemia during surgery , 2018, Nature Biomedical Engineering.

[39]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[40]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[41]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[42]  Susan Athey,et al.  The State of Applied Econometrics - Causality and Policy Evaluation , 2016, 1607.00699.

[43]  J. Kleinberg,et al.  Prediction Policy Problems. , 2015, The American economic review.

[44]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[45]  Kjersti Aas,et al.  Explaining individual predictions when features are dependent: More accurate approximations to Shapley values , 2019, Artif. Intell..

[46]  Frederik Harder,et al.  Interpretable and Differentially Private Predictions , 2019, AAAI.

[47]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[48]  Joydeep Ghosh,et al.  CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models , 2019, ArXiv.

[49]  Giles Hooker,et al.  Please Stop Permuting Features: An Explanation and Alternatives , 2019, ArXiv.

[50]  Shi Feng,et al.  Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation , 2019, ICML.

[51]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[52]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[53]  P. Holland Statistics and Causal Inference , 1985 .

[54]  Carola-Bibiane Schönlieb,et al.  On the Connection Between Adversarial Robustness and Saliency Map Interpretability , 2019, ICML.

[55]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[56]  Reza Shokri,et al.  Privacy Risks of Explaining Machine Learning Models , 2019, ArXiv.

[57]  Chris Russell,et al.  Explaining Explanations in AI , 2018, FAT.

[58]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[59]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[60]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[61]  R. Cook Detection of influential observation in linear regression , 2000 .

[62]  Klaus-Robert Müller,et al.  Explanations can be manipulated and geometry is to blame , 2019, NeurIPS.

[63]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[64]  Zoubin Ghahramani,et al.  Discovering Interpretable Representations for Both Deep Generative and Discriminative Models , 2018, ICML.

[65]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[66]  Avanti Shrikumar,et al.  Gkmexplain: Fast and Accurate Interpretation of Nonlinear Gapped k-mer SVMs Using Integrated Gradients , 2018, bioRxiv.

[67]  Donghyun Kim,et al.  Excitation Backprop for RNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Mehmet Fatih Çömlekçi Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions that Shape Social Media , 2019 .

[69]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[70]  Lindsay Blackwell,et al.  Best practices for conducting risky research and protecting yourself from online harassment , 2016 .

[71]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[72]  Peter A. Flach,et al.  Counterfactual Explanations of Machine Learning Predictions: Opportunities and Challenges for AI Safety , 2019, SafeAI@AAAI.

[73]  Nicholas G. Polson,et al.  Deep Learning in Finance , 2016, ArXiv.

[74]  Pedro Bizarro,et al.  Automatic Model Monitoring for Data Streams , 2019, ArXiv.

[75]  L. S. Shapley,et al.  17. A Value for n-Person Games , 1953 .

[76]  Adrian Weller,et al.  Transparency: Motivations and Challenges , 2017, Explainable AI.

[77]  Le Song,et al.  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data , 2018, ICLR.

[78]  Sarah Myers West,et al.  What do we mean when we talk about transparency? Towards meaningful transparency in commercial content moderation , 2019 .

[79]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.