Gifsplanation via Latent Shift: A Simple Autoencoder Approach to Counterfactual Generation for Chest X-rays

Motivation: Traditional image attribution methods struggle to satisfactorily explain predictions of neural networks. Prediction explanation is important, especially in medical imaging, for avoiding the unintended consequences of deploying AI systems when false positive predictions can impact patient care. Thus, there is a pressing need to develop improved models for model explainability and introspection. Specific problem: A new approach is to transform input images to increase or decrease features which cause the prediction. However, current approaches are difficult to implement as they are monolithic or rely on GANs. These hurdles prevent wide adoption. Our approach: Given an arbitrary classifier, we propose a simple autoencoder and gradient update (Latent Shift) that can transform the latent representation of a specific input image to exaggerate or curtail the features used for prediction. We use this method to study chest X-ray classifiers and evaluate their performance. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to identify which ones are false positives (half are) using traditional attribution maps or our proposed method. Results: We found low overlap with ground truth pathology masks for models with reasonably high accuracy. However, the results from our reader study indicate that these models are generally looking at the correct features. We also found that the Latent Shift explanation allows a user to have more confidence in true positive predictions compared to traditional approaches (0.15±0.95 in a 5 point scale with p=0.01) with only a small increase in false positive predictions (0.04±1.06 with p=0.57). ∗ Contributed equally © 2021 J.P. Cohen, R. Brooks, S. En, E. Zucker, A. Pareek, M. Lungren & A. Chaudhari. Gifsplanation via Latent Shift

[1]  Carol C Wu,et al.  Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia. , 2019, Radiology. Artificial intelligence.

[2]  Oluwasanmi Koyejo,et al.  xGEMs: Generating Examplars to Explain Black-Box Models , 2018, ArXiv.

[3]  Olivier Moindrot,et al.  Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images , 2021, ArXiv.

[4]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[5]  Kayhan Batmanghelich,et al.  Explaining the Black-box Smoothly- A Counterfactual Approach , 2021, ArXiv.

[6]  Antonio Pertusa,et al.  PadChest: A large chest x-ray image dataset with multi-label annotated reports , 2019, Medical Image Anal..

[7]  Yi Li,et al.  Weakly Supervised Lesion Localization With Probabilistic-CAM Pooling , 2020, ArXiv.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[10]  Ender Konukoglu,et al.  Visual Feature Attribution Using Wasserstein GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  N. Arun,et al.  Assessing the (Un)Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging , 2020, medRxiv.

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[13]  David F. Steiner,et al.  Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation. , 2019, Radiology.

[14]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[15]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[16]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[17]  Yoshua Bengio,et al.  Saliency is a Possible Red Herring When Diagnosing Poor Generalization , 2021, ICLR.

[18]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[19]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[20]  Joseph Paul Cohen,et al.  TorchXRayVision: A library of chest X-ray datasets and models , 2021, MIDL.

[21]  Roger G. Mark,et al.  MIMIC-CXR: A large publicly available database of labeled chest radiographs , 2019, ArXiv.

[22]  Ken Chang,et al.  Assessing the validity of saliency maps for abnormality localization in medical imaging , 2020, ArXiv.

[23]  Brian Pollack,et al.  Explanation by Progressive Exaggeration , 2020, ICLR.

[24]  Surya Nepal,et al.  Adversarial Defense by Latent Style Transformations , 2020, ArXiv.

[25]  George Shih,et al.  Crowdsourcing pneumothorax annotations using machine learning annotations on the NIH chest X-ray dataset , 2019, Journal of Digital Imaging.

[26]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[27]  Sina Honari,et al.  Distribution Matching Losses Can Hallucinate Features in Medical Image Translation , 2018, MICCAI.

[28]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[29]  Pascal Vincent,et al.  Iteratively unveiling new regions of interest in Deep Learning models , 2018 .

[30]  Clement J. McDonald,et al.  Preparing a collection of radiology examinations for distribution and retrieval , 2015, J. Am. Medical Informatics Assoc..