M EANINGFULLY EXPLAINING MODEL MISTAKES USING CONCEPTUAL COUNTERFACTUALS

Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases. However, this is often an ad hoc process that involves manually looking at the model’s mistakes on many test samples and guessing at the underlying reasons for those incorrect predictions. In this paper, we propose a systematic approach, conceptual counterfactual explanations (CCE), that explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts (e.g. this zebra is misclassified as a dog because of faint stripes). We base CCE on two prior ideas: counterfactual explanations and concept activation vectors, and validate our approach on well-known pretrained models, showing that it explains the models’ mistakes meaningfully. In addition, for new models trained on data with spurious correlations, CCE accurately identifies the spurious correlation as the cause of model mistakes from a single misclassified test sample. On two challenging medical applications, CCE generated useful insights, confirmed by clinicians, into biases and mistakes the model makes in real-world settings. The code for CCE is publicly available and can easily be applied to explain mistakes in new models.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[3]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Suchi Saria,et al.  Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport , 2018, AISTATS.

[5]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[6]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[7]  Daniel E. Ho,et al.  How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals , 2021, Nature Medicine.

[8]  Mehmet Türkan,et al.  A survey on automated melanoma detection , 2018, Eng. Appl. Artif. Intell..

[9]  Diego H. Milone,et al.  Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis , 2020, Proceedings of the National Academy of Sciences.

[10]  James Zou,et al.  Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild , 2019, ArXiv.

[11]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  James Zou,et al.  Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[13]  Kevin Leyton-Brown,et al.  The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models , 2021, FAccT.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  James Y. Zou,et al.  Multiaccuracy: Black-Box Post-Processing for Fairness in Classification , 2018, AIES.

[17]  Lalana Kagal,et al.  Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models , 2016, ArXiv.

[18]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[19]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[20]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[21]  Bolei Zhou,et al.  Interpretable Basis Decomposition for Visual Explanation , 2018, ECCV.

[22]  Andrea Vedaldi,et al.  Net2Vec: Quantifying and Explaining How Concepts are Encoded by Filters in Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[24]  George Shih,et al.  A patient-centric dataset of images and metadata for identifying melanomas using clinical context , 2020, Scientific Data.

[25]  Martin Wattenberg,et al.  The What-If Tool: Interactive Probing of Machine Learning Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[26]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[27]  Marzyeh Ghassemi,et al.  CheXclusion: Fairness gaps in deep chest X-ray classifiers , 2020, PSB.

[28]  John P. Dickerson,et al.  Counterfactual Explanations for Machine Learning: A Review , 2020, ArXiv.

[29]  T. Fitzpatrick The validity and practicality of sun-reactive skin types I through VI. , 1988, Archives of dermatology.

[30]  Anna Goldenberg,et al.  Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks , 2019, MLHC.

[31]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[32]  Amit Sharma,et al.  Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers , 2019, ArXiv.

[33]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[34]  Guangquan Zhang,et al.  Learning under Concept Drift: A Review , 2019, IEEE Transactions on Knowledge and Data Engineering.