"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

[1]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[2]  References , 1971 .

[3]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[4]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[5]  U. Feige A threshold of ln n for approximating set cover , 1998, JACM.

[6]  John Riedl,et al.  Explaining collaborative filtering recommendations , 2000, CSCW '00.

[7]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[8]  Regina A. Pomranky,et al.  The role of trust in automation reliance , 2003, Int. J. Hum. Comput. Stud..

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[11]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[12]  James A. Landay,et al.  Investigating statistical machine learning as a tool for software development , 2008, CHI.

[13]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[14]  James A. Landay,et al.  Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[15]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[16]  Erik Strumbelj,et al.  An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Leakage in data mining: Formulation, detection, and avoidance , 2012, TKDD.

[19]  Jure Leskovec,et al.  A computational approach to politeness with application to social factors , 2013, ACL.

[20]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[21]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[24]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[25]  Ali Farhadi,et al.  Towards Transparent Systems: Semantic Characterization of Failure Modes , 2014, ECCV.

[26]  Alex Groce,et al.  You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems , 2014, IEEE Transactions on Software Engineering.

[27]  Ali Farhadi,et al.  Predicting Failures of Vision Systems , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[29]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[30]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[31]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[32]  Sameer Singh,et al.  Towards Extracting Faithful and Descriptive Representations of Latent Variable Models , 2015, AAAI Spring Symposia.

[33]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[34]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[35]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[36]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[37]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[38]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[40]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[41]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.