A Framework for Interpretability in Machine Learning for Medical Imaging

Interpretability for machine learning models in medical imaging (MLMI) is an important direction of research. However, there is a general sense of murkiness in what interpretability means. Why does the need for interpretability in MLMI arise? What goals does one actually seek to address when interpretability is needed? To answer these questions, we identify a need to formalize the goals and elements of interpretability in MLMI. By reasoning about real-world tasks and goals common in both medical image analysis and its intersection with machine learning, we identify four core elements of interpretability: localization, visual recognizability, physical attribution, and transparency. Overall, this paper formalizes interpretability needs in the context of medical imaging, and our applied perspective clarifies concrete MLMI-specific goals and considerations in order to guide method design and improve real-world usage. Our goal is to provide practical and didactic information for model designers and practitioners, inspire developers of models in the medical imaging field to reason more deeply about what interpretability is achieving, and suggest future directions of interpretability research.

[1]  A. Crippa,et al.  Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study. , 2023, The Lancet. Digital health.

[2]  Christian F. Baumgartner,et al.  Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations? , 2023, MICCAI.

[3]  P. Rajpurkar,et al.  The Current and Future State of AI Interpretation of Medical Images. , 2023, The New England journal of medicine.

[4]  B. Patel,et al.  SCGAN: Sparse CounterGAN for Counterfactual Explanations in Breast Cancer Prediction , 2023, medRxiv.

[5]  J. Neves,et al.  Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  M. Sabuncu,et al.  Learning to Compare Longitudinal Images , 2023, ArXiv.

[7]  Colleen M. Farrelly,et al.  Topological data analysis in medical imaging: current state of the art , 2023, Insights into Imaging.

[8]  Yang Chen,et al.  Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-training , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Po-Hsuan Cameron Chen,et al.  Pathologist Validation of a Machine Learning–Derived Feature for Colon Cancer Risk Stratification , 2023, JAMA network open.

[10]  Christian F. Baumgartner,et al.  Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals , 2023, MIDL.

[11]  Nahian F. Chowdhury,et al.  Anatomically interpretable deep learning of brain age captures domain-specific cognitive impairment , 2023, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Krishnamurthy Dvijotham,et al.  Interactive Concept Bottleneck Models , 2022, AAAI.

[13]  Alan Q. Wang,et al.  A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated Classification , 2022, Trans. Mach. Learn. Res..

[14]  M. Sabuncu,et al.  Machine learning based multi-modal prediction of future decline toward Alzheimer’s disease: An empirical study , 2022, PloS one.

[15]  Hao Wu,et al.  A radiomics feature-based machine learning models to detect brainstem infarction (RMEBI) may enable early diagnosis in non-contrast enhanced CT , 2022, European Radiology.

[16]  A. Hammers,et al.  A Study of Demographic Bias in CNN-based Brain MR Segmentation , 2022, MLCN@MICCAI.

[17]  Yan-Wei Lee,et al.  CheXGAT: A disease correlation-aware network for thorax disease diagnosis from chest X-ray images , 2022, Artif. Intell. Medicine.

[18]  D. Rueckert,et al.  A Review of Causality for Learning Algorithms in Medical Image Analysis , 2022, Machine Learning for Biomedical Imaging.

[19]  James Y. Zou,et al.  Post-hoc Concept Bottleneck Models , 2022, ICLR.

[20]  for the Alzheimer’s Disease Neuroimaging Initiative,et al.  A deep learning MRI approach outperforms other biomarkers of prodromal Alzheimer’s disease , 2022, Alzheimer's Research & Therapy.

[21]  V. Ambrosini,et al.  New PET Radiotracers for the Imaging of Neuroendocrine Neoplasms , 2022, Current Treatment Options in Oncology.

[22]  Max W. Shen Trust in AI: Interpretability is not necessary or sufficient, while black-box interaction is necessary and sufficient , 2022, ArXiv.

[23]  Bhavan Kumar Vasu,et al.  X-MIR: EXplainable Medical Image Retrieval , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[24]  Antonio Torralba,et al.  Editing a classifier by rewriting its prediction rules , 2021, NeurIPS.

[25]  L. Carin,et al.  Explainable multiple abnormality classification of chest CT volumes with AxialNet and HiResCAM , 2021, Artif. Intell. Medicine.

[26]  Philippe Lambin,et al.  Transparency of Deep Neural Networks for Medical Image Analysis: A Review of Interpretability Methods , 2021, Comput. Biol. Medicine.

[27]  Karina Vold,et al.  Believing in Black Boxes: Machine Learning for Healthcare Does Not Need Explainability to be Evidence-Based. , 2021, Journal of clinical epidemiology.

[28]  A. Madabhushi,et al.  Predicting cancer outcomes with radiomics and artificial intelligence in radiology , 2021, Nature Reviews Clinical Oncology.

[29]  Nirvair Neeru,et al.  A Survey on Deep Learning Approaches to Medical Images and a Systematic Look up into Real-Time Object Detection , 2021, Archives of Computational Methods in Engineering.

[30]  Sung Soo Ahn,et al.  An interpretable multiparametric radiomics model for the diagnosis of schizophrenia using magnetic resonance imaging of the corpus callosum , 2021, Translational Psychiatry.

[31]  H. Bach,et al.  Current and Future Development in Lung Cancer Diagnosis , 2021, International journal of molecular sciences.

[32]  Shinjini Kundu AI in medicine must be explainable , 2021, Nature Medicine.

[33]  Max A. Viergever,et al.  Explainable artificial intelligence (XAI) in deep learning-based medical image analysis , 2021, Medical Image Anal..

[34]  S. Petersen,et al.  Fairness in Cardiac Magnetic Resonance Imaging: Assessing Sex and Racial Bias in Deep Learning-Based Segmentation , 2021, Frontiers in Cardiovascular Medicine.

[35]  S. Saria,et al.  The Clinician and Dataset Shift in Artificial Intelligence. , 2021, The New England journal of medicine.

[36]  T. Evgeniou,et al.  Beware explanations from AI in health care , 2021, Science.

[37]  James C. Gee,et al.  Brain MRI Deep Learning and Bayesian Inference System Augments Radiology Resident Performance , 2021, Journal of Digital Imaging.

[38]  Ruth Wario,et al.  A survey on deep learning in medical image reconstruction , 2021 .

[39]  Bernt Schiele,et al.  Convolutional Dynamic Alignment Networks for Interpretable Classifications , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Sungroh Yoon,et al.  XProtoNet: Diagnosis in Chest Radiography with Global and Local Explanations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  G. Varoquaux,et al.  Machine learning for medical imaging: methodological failures and recommendations for the future , 2021, npj Digital Medicine.

[42]  Nilay Ganatra A Comprehensive Study of Applying Object Detection Methods for Medical Image Analysis , 2021, 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom).

[43]  Olivier Moindrot,et al.  Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images , 2021, ArXiv.

[44]  Yang Lei,et al.  A review on medical imaging synthesis using deep learning and its clinical applications , 2020, Journal of applied clinical medical physics.

[45]  José García Rodríguez,et al.  A Survey of Alzheimer’s Disease Early Diagnosis Methods for Cognitive Assessment , 2020, Sensors.

[46]  Lauren N. Ross Causal Concepts in Biology: How Pathways Differ from Mechanisms and Why It Matters , 2020, The British Journal for the Philosophy of Science.

[47]  R. Jeraj,et al.  Interpretation and Visualization Techniques for Deep Learning Models in Medical Imaging , 2020, Physics in medicine and biology.

[48]  C. Chaou,et al.  Artificial intelligence-based education assists medical students’ interpretation of hip fracture , 2020, Insights into Imaging.

[49]  Ellery Wulczyn,et al.  Interpretable survival prediction for colorectal cancer using deep learning , 2020, npj Digital Medicine.

[50]  L. Carin,et al.  Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks , 2020, 2011.08891.

[51]  E. Fabris,et al.  Cardiac Tumors: Diagnosis, Prognosis, and Treatment , 2020, Current Cardiology Reports.

[52]  Christian Haas,et al.  Fairness in Machine Learning: A Survey , 2020, ACM Comput. Surv..

[53]  Abhishek Gupta,et al.  CounteRGAN: Generating Realistic Counterfactuals with Residual Generative Adversarial Nets , 2020, ArXiv.

[54]  Nicholas M Luscombe,et al.  Clinically-relevant vulnerabilities of deep machine learning systems for skin cancer diagnosis. , 2020, The Journal of investigative dermatology.

[55]  Van Sinh Nguyen,et al.  Application of Geometric Modeling in Visualizing the Medical Image Dataset , 2020, SN Computer Science.

[56]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[57]  V. Cheplygina,et al.  Risk of Training Diagnostic Algorithms on Data with Demographic Bias , 2020, iMIMIC/MIL3iD/LABELS@MICCAI.

[58]  Heechan Yang,et al.  Guided Soft Attention Network for Classification of Breast Cancer Histopathology Images , 2020, IEEE Transactions on Medical Imaging.

[59]  Carlos A. Silva,et al.  On the Interpretability of Artificial Intelligence in Radiology: Challenges and Opportunities. , 2020, Radiology. Artificial intelligence.

[60]  Suyash Mohan,et al.  Artificial Intelligence System Approaching Neuroradiologist-level Differential Diagnosis Accuracy at Brain MRI. , 2020, Radiology.

[61]  Dmitry Goldgof,et al.  Mitigating Adversarial Attacks on Medical Image Understanding Systems , 2020, 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).

[62]  Nick Cammarata,et al.  Zoom In: An Introduction to Circuits , 2020 .

[63]  C. Slump,et al.  Quantitative imaging: systematic review of perfusion/flow phantoms , 2020, European Radiology Experimental.

[64]  Chetan S. Kulkarni,et al.  Fusing Physics-based and Deep Learning Models for Prognostics , 2020, Reliab. Eng. Syst. Saf..

[65]  C. Rudin,et al.  Concept whitening for interpretable image recognition , 2020, Nature Machine Intelligence.

[66]  Soren Christensen,et al.  Review of Perfusion Imaging in Acute Ischemic Stroke: From Time to Tissue. , 2020, Stroke.

[67]  Antonio J. Plaza,et al.  Image Segmentation Using Deep Learning: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Anne L. Martel,et al.  Deep neural network models for computational histopathology: A survey , 2019, Medical Image Anal..

[69]  Yonina C. Eldar,et al.  Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing , 2019, IEEE Signal Processing Magazine.

[70]  Daniel C. Castro,et al.  Causality matters in medical imaging , 2019, Nature Communications.

[71]  Shivajirao M. Jadhav,et al.  Deep convolutional neural network based medical image classification for disease diagnosis , 2019, Journal of Big Data.

[72]  Yang Wang,et al.  Regression‐based machine‐learning approaches to predict task activation using resting‐state fMRI , 2019, Human brain mapping.

[73]  Maxim Pisov,et al.  Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection , 2019, iMIMIC/ML-CDS@MICCAI.

[74]  Teresa Wu,et al.  Integration of machine learning and mechanistic models accurately predicts variation in cell density of glioblastoma using multiparametric MRI , 2019, Scientific Reports.

[75]  P. Stone,et al.  Prognostication in palliative care. , 2019, Clinical medicine.

[76]  Yong Man Ro,et al.  Generation of Multimodal Justification Using Visual Word Constraint Model for Explainable Computer-Aided Diagnosis , 2019, iMIMIC/ML-CDS@MICCAI.

[77]  Ross Upshur,et al.  A population-based approach to integrated healthcare delivery: a scoping review of clinical care and public health collaboration , 2019, BMC Public Health.

[78]  Ghassan Hamarneh,et al.  Melanoma Recognition via Visual Attention , 2019, IPMI.

[79]  J. Duncan,et al.  Deep learning for liver tumor diagnosis part II: convolutional neural network interpretation using radiologic imaging features , 2019, European Radiology.

[80]  Davide Fontanarosa,et al.  Ultrasound guidance in minimally invasive robotic procedures☆ , 2019, Medical Image Anal..

[81]  A. Vaccaro,et al.  Robotic Guidance in Minimally Invasive Spine Surgery: a Review of Recent Literature and Commentary on a Developing Technology , 2019, Current Reviews in Musculoskeletal Medicine.

[82]  R. Platt,et al.  The Emergence of Population Health in US Academic Medicine , 2019, JAMA network open.

[83]  E. Bruera,et al.  Prognostication in advanced cancer: update and directions for future research , 2019, Supportive Care in Cancer.

[84]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[85]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[86]  Nadir N. Charniya,et al.  Infrared Thermography and its Applications: A Review , 2018, 2018 3rd International Conference on Communication and Electronics Systems (ICCES).

[87]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[88]  C. Rudin,et al.  This looks like that: deep learning for interpretable image recognition , 2018, NeurIPS.

[89]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[90]  Mert R. Sabuncu,et al.  Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[91]  R. Baker,et al.  Mechanistic models versus machine learning, a fight worth fighting for the biological community? , 2018, Biology Letters.

[92]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[93]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[94]  Ender Konukoglu,et al.  Visual Feature Attribution Using Wasserstein GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[95]  Zachary C. Lipton,et al.  The Doctor Just Won't Accept That! , 2017, 1711.08037.

[96]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[97]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[98]  P. Lambin,et al.  Radiomics: the bridge between medical imaging and personalized medicine , 2017, Nature Reviews Clinical Oncology.

[99]  Ziad Obermeyer,et al.  Lost in Thought - The Limits of the Human Mind and the Future of Medicine. , 2017, The New England journal of medicine.

[100]  Lin Yang,et al.  MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[101]  Konstantinos Kamnitsas,et al.  Anatomically Constrained Neural Networks (ACNNs): Application to Cardiac Image Enhancement and Segmentation , 2017, IEEE Transactions on Medical Imaging.

[102]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[103]  William H. Grover,et al.  Measuring the mass, volume, and density of microgram-sized objects in fluid , 2017, PloS one.

[104]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[105]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[106]  M. Chammas,et al.  Ultrasound Elastography: Review of Techniques and Clinical Applications , 2017, Theranostics.

[107]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[108]  C. Meinel,et al.  Deep Learning for Medical Image Analysis , 2017, ArXiv.

[109]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, International Journal of Computer Vision.

[110]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[111]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[112]  M. Mcphail,et al.  Magnetic Resonance Spectroscopy: Principles and Techniques: Lessons for Clinicians. , 2015, Journal of clinical and experimental hepatology.

[113]  Paul Kinahan,et al.  Radiomics: Images Are More than Pictures, They Are Data , 2015, Radiology.

[114]  C. Cronin,et al.  Imaging of bone metastasis: An update. , 2015, World journal of radiology.

[115]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[116]  Berkeley J. Dietvorst,et al.  Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[117]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[118]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[119]  Emil Pitkin,et al.  Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[120]  T. Jayakumar,et al.  Medical applications of infrared thermography: A review , 2012, Infrared Physics & Technology.

[121]  Dinggang Shen,et al.  Machine Learning in Medical Imaging , 2012, Machine Vision and Applications.

[122]  Patrick Granton,et al.  Radiomics: extracting more information from medical images using advanced feature analysis. , 2012, European journal of cancer.

[123]  B. Garra,et al.  AN OVERVIEW OF ELASTOGRAPHY - AN EMERGING BRANCH OF MEDICAL IMAGING. , 2011, Current medical imaging reviews.

[124]  G. Glover Overview of functional magnetic resonance imaging. , 2011, Neurosurgery clinics of North America.

[125]  Raja Parasuraman,et al.  Complacency and Bias in Human Use of Automation: An Attentional Integration , 2010, Hum. Factors.

[126]  Hongtao Lu,et al.  Partial dependence of breast tumor malignancy on ultrasound image features derived from boosted trees , 2010, J. Electronic Imaging.

[127]  M. Yaffe Mammographic density. Measurement of mammographic density , 2008, Breast Cancer Research.

[128]  Kunio Doi,et al.  Computer-aided diagnosis in medical imaging: Historical review, current status and future potential , 2007, Comput. Medical Imaging Graph..

[129]  Jan Flusser,et al.  Image registration methods: a survey , 2003, Image Vis. Comput..

[130]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[131]  D. Dantzker Adequacy of tissue oxygenation , 1993, Critical care medicine.

[132]  D. Louis,et al.  Disease staging: Implications for hospital reimbursement and management , 1984, Health care financing review.

[133]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[134]  D. Henning Metabolism , 1972, Introduction to a Phenomenology of Life.

[135]  Illtyd Trethowan Causality , 1938 .

[136]  Derin Mathew,et al.  Artificial intelligence for Breast Cancer Detection , 2023, international journal of engineering technology and management sciences.

[137]  Julia E. Vogt,et al.  Multiview Concept Bottleneck Models Applied to Diagnosing Pediatric Appendicitis , 2022 .

[138]  Christian Wachinger,et al.  Shape in Medical Imaging , 2018, Lecture Notes in Computer Science.

[139]  Marco Tulio Ribeiro,et al.  “ Why Should I Trust You ? ” Explaining the Predictions of Any Classifier , 2016 .

[140]  Abdul V. Roudsari,et al.  Automation bias: a systematic review of frequency, effect mediators, and mitigators , 2012, J. Am. Medical Informatics Assoc..

[141]  Arnold Picot,et al.  Trust in IT , 2011 .

[142]  Susan Craw,et al.  Case-Based Reasoning , 2010, Encyclopedia of Machine Learning.

[143]  N. Noel,et al.  Evidence-Based Treatment Planning for Substance AbuseTherapy , 2009 .

[144]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[145]  J. Aronson Monitoring therapy. , 2005, British journal of clinical pharmacology.