Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review

Machine Learning and Artificial Intelligence (AI) more broadly have great immediate and future potential for transforming almost all aspects of medicine. However, in many applications, even outside medicine, a lack of transparency in AI applications has become increasingly problematic. This is particularly pronounced where users need to interpret the output of AI systems. Explainable AI (XAI) provides a rationale that allows users to understand why a system has produced a given output. The output can then be interpreted within a given context. One area that is in great need of XAI is that of Clinical Decision Support Systems (CDSSs). These systems support medical practitioners in their clinic decision-making and in the absence of explainability may lead to issues of under or over-reliance. Providing explanations for how recommendations are arrived at will allow practitioners to make more nuanced, and in some cases, life-saving decisions. The need for XAI in CDSS, and the medical field in general, is amplified by the need for ethical and fair decision-making and the fact that AI trained with historical data can be a reinforcement agent of historical actions and biases that should be uncovered. We performed a systematic literature review of work to-date in the application of XAI in CDSS. Tabular data processing XAI-enabled systems are the most common, while XAI-enabled CDSS for text analysis are the least common in literature. There is more interest in developers for the provision of local explanations, while there was almost a balance between post-hoc and ante-hoc explanations, as well as between model-specific and model-agnostic techniques. Studies reported benefits of the use of XAI such as the fact that it could enhance decision confidence for clinicians, or generate the hypothesis about causality, which ultimately leads to increased trustworthiness and acceptability of the system and potential for its incorporation in the clinical workflow. However, we found an overall distinct lack of application of XAI in the context of CDSS and, in particular, a lack of user studies exploring the needs of clinicians. We propose some guidelines for the implementation of XAI in CDSS and explore some opportunities, challenges, and future research needs.

[1]  Tong Li,et al.  Combining Machine Learning and Logical Reasoning to Improve Requirements Traceability Recovery , 2020, Applied Sciences.

[2]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[3]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[4]  S. Johnson,et al.  AI, Machine Learning, and Ethics in Health Care , 2019, The Journal of legal medicine.

[5]  Gautam Kunapuli,et al.  A Decision-Support Tool for Renal Mass Classification , 2018, Journal of Digital Imaging.

[6]  Avi Goldfarb,et al.  Clinical considerations when applying machine learning to decision-support tasks versus automation , 2019, BMJ Quality & Safety.

[7]  M. Nijsten,et al.  Machine learning in infection management using routine electronic health records: tools, techniques, and reporting of future technologies. , 2020, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[8]  Mo Mansouri,et al.  Role of Artificial Intelligence, Clinicians & Policymakers in Clinical Decision Making: A Systems Viewpoint , 2019, 2019 International Symposium on Systems Engineering (ISSE).

[9]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[10]  Artur S. d'Avila Garcez,et al.  Measurable Counterfactual Local Explanations for Any Classifier , 2019, ECAI.

[11]  D Lansing Taylor,et al.  Explainable AI (xAI) for Anatomic Pathology. , 2020, Advances in anatomic pathology.

[12]  Hiok Chai Quek,et al.  Improving tractability of Clinical Decision Support system , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[13]  Esra Zihni,et al.  Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome , 2020, PloS one.

[14]  Dimitrios A. Gatsios,et al.  Designing a mHealth clinical decision support system for Parkinson’s disease: a theoretically grounded user needs approach , 2020, BMC Medical Informatics and Decision Making.

[15]  Roberto Pirrone,et al.  Recent advances of HCI in decision-making tasks for optimized clinical workflows and precision medicine , 2020, J. Biomed. Informatics.

[16]  Bernd Bischl,et al.  Interpretable Machine Learning - A Brief History, State-of-the-Art and Challenges , 2020, PKDD/ECML Workshops.

[17]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[18]  Karima Sedki,et al.  Explainable decision support through the learning and visualization of preferences from a formal ontology of antibiotic treatments , 2020, J. Biomed. Informatics.

[19]  Eliza Strickland,et al.  IBM Watson, heal thyself: How IBM overpromised and underdelivered on AI health care , 2019, IEEE Spectrum.

[20]  Alessandro Blasimme,et al.  Explainability for artificial intelligence in healthcare: a multidisciplinary perspective , 2020, BMC Medical Informatics and Decision Making.

[21]  Xiang 'Anthony' Chen,et al.  CheXplain: Enabling Physicians to Explore and Understand Data-Driven, AI-Enabled Medical Imaging Analysis , 2020, CHI.

[22]  Mike Thelwall,et al.  Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories , 2018, J. Informetrics.

[23]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[24]  David R Williams,et al.  Lack Of Diversity In Genomic Databases Is A Barrier To Translating Precision Medicine Research Into Practice. , 2018, Health affairs.

[25]  Alfredo Vellido,et al.  The importance of interpretability and visualization in machine learning for applications in medicine and health care , 2019, Neural Computing and Applications.

[26]  Tin Wee Tan,et al.  Development of a clinical decision support system for diabetes care: A pilot study , 2017, PloS one.

[27]  Maia L. Jacobs,et al.  How machine-learning recommendations influence clinician treatment selections: the example of antidepressant selection , 2021, Translational Psychiatry.

[28]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[29]  Eoin M. Kenny,et al.  Explaining black-box classifiers using post-hoc explanations-by-example: The effect of explanations and error-rates in XAI user studies , 2021, Artif. Intell..

[30]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[31]  K. Crawford,et al.  Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice , 2019 .

[32]  Jyrki Lötjönen,et al.  Design and Application of a Generic Clinical Decision Support System for Multiscale Data , 2012, IEEE Transactions on Biomedical Engineering.

[33]  Kyunghyun Cho,et al.  High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks , 2017, ArXiv.

[34]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[35]  Abeba Birhane,et al.  Algorithmic injustice: a relational ethics approach , 2021, Patterns.

[36]  Adam Wright,et al.  White paper: A Roadmap for National Action on Clinical Decision Support , 2007, J. Am. Medical Informatics Assoc..

[37]  Judy Hoffman,et al.  Predictive Inequity in Object Detection , 2019, ArXiv.

[38]  Mario Cannataro,et al.  Explainable Sentiment Analysis with Applications in Medicine , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[39]  Eric Vorm,et al.  Assessing Demand for Transparency in Intelligent Systems Using Machine Learning , 2018, 2018 Innovations in Intelligent Systems and Applications (INISTA).

[40]  K. AnoojP.,et al.  Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules , 2012, J. King Saud Univ. Comput. Inf. Sci..

[41]  Billy Amzal,et al.  Arti fi cial Intelligence and Machine Learning Applied at the Point of Care , 2022 .

[42]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[43]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[44]  Emily Sullivan,et al.  Understanding from Machine Learning Models , 2020, The British Journal for the Philosophy of Science.

[45]  Arantza Casillas,et al.  Extracting Cause of Death From Verbal Autopsy With Deep Learning Interpretable Methods , 2020, IEEE Journal of Biomedical and Health Informatics.

[46]  Alex John London,et al.  Artificial Intelligence and Black-Box Medical Decisions: Accuracy versus Explainability. , 2019, The Hastings Center report.

[47]  D. Baumgart,et al.  An overview of clinical decision support systems: benefits, risks, and strategies for success. , 2020, NPJ digital medicine.

[48]  W. Gallagher,et al.  Clinical Decision Support Systems in Breast Cancer: A Systematic Review , 2020, Cancers.

[49]  Martin Sedlmayr,et al.  Diagnosis of Rare Diseases: a scoping review of clinical decision support systems , 2020, Orphanet Journal of Rare Diseases.

[50]  Ute Schmid,et al.  The Next Generation of Medical Decision Support: A Roadmap Toward Transparent Expert Companions , 2020, Frontiers in Artificial Intelligence.

[51]  Jianlong Zhou,et al.  Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics , 2021, Electronics.

[52]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[53]  Patricia C. Dykes,et al.  Development and validation of early warning score system: A systematic literature review , 2020, J. Biomed. Informatics.

[54]  Gokul S. Krishnan,et al.  FarSight: Long-Term Disease Prediction Using Unstructured Clinical Nursing Notes , 2021, IEEE Transactions on Emerging Topics in Computing.

[55]  Helmut Schulte,et al.  Framingham risk function overestimates risk of coronary heart disease in men and women from Germany--results from the MONICA Augsburg and the PROCAM cohorts. , 2003, European heart journal.

[56]  Kuang-Yao Yang,et al.  Using a machine learning approach to predict mortality in critically ill influenza patients: a cross-sectional retrospective multicentre study in Taiwan , 2020, BMJ Open.

[57]  Sotiris Kotsiantis,et al.  Explainable AI: A Review of Machine Learning Interpretability Methods , 2020, Entropy.

[58]  P. K. Anooj,et al.  Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules , 2012, J. King Saud Univ. Comput. Inf. Sci..

[59]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[60]  Alice Xiang,et al.  Machine Learning Explainability for External Stakeholders , 2020, ArXiv.

[61]  Luigi Ceccaroni,et al.  Clinical Decision Support Systems (CDSS) for preventive management of COPD patients , 2014, Journal of Translational Medicine.

[62]  Lauren Wilcox,et al.  "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making , 2019, Proc. ACM Hum. Comput. Interact..

[63]  Christian Biemann,et al.  What do we need to build explainable AI systems for the medical domain? , 2017, ArXiv.

[64]  Kadija Ferryman,et al.  Fairness in precision medicine , 2018 .

[65]  Leonard Wee,et al.  Artificial intelligence‐based clinical decision support in modern medical physics: Selection, acceptance, commissioning, and quality assurance , 2020, Medical physics.

[66]  José M. Alonso,et al.  An Ontology-Based Interpretable Fuzzy Decision Support System for Diabetes Diagnosis , 2018, IEEE Access.

[67]  Jacques Bouaud,et al.  Explainable artificial intelligence for breast cancer: A visual case-based reasoning approach , 2019, Artif. Intell. Medicine.

[68]  Andreas Holzinger,et al.  Explainable AI and Multi-Modal Causability in Medicine , 2020, i-com.

[69]  D. Bates,et al.  Clinical Decision Support Systems , 1999, Health Informatics.

[70]  Ulrike von Luxburg,et al.  Explaining the Explainer: A First Theoretical Analysis of LIME , 2020, AISTATS.

[71]  Eui Jin Hwang,et al.  Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs , 2019, JAMA network open.

[72]  Q. Liao,et al.  Questioning the AI: Informing Design Practices for Explainable AI User Experiences , 2020, CHI.

[73]  Christopher J. Dente,et al.  Precision diagnosis: a view of the clinical decision support systems (CDSS) landscape through the lens of critical care , 2017, Journal of Clinical Monitoring and Computing.

[74]  Eric J Topol,et al.  High-performance medicine: the convergence of human and artificial intelligence , 2019, Nature Medicine.

[75]  Jennifer C. Hughes,et al.  Sleep quality prediction in caregivers using physiological signals , 2019, Comput. Biol. Medicine.

[76]  Francesco Borrelli,et al.  Predictive Active Steering Control for Autonomous Vehicle Systems , 2007, IEEE Transactions on Control Systems Technology.

[77]  Tim Miller,et al.  Explainable AI: Beware of Inmates Running the Asylum Or: How I Learnt to Stop Worrying and Love the Social and Behavioural Sciences , 2017, ArXiv.

[78]  Pierre Baldi,et al.  Deep Learning in Biomedical Data Science , 2018, Annual Review of Biomedical Data Science.

[79]  Robert Gniadecki,et al.  Artificial Intelligence Applications in Dermatology: Where Do We Stand? , 2020, Frontiers in Medicine.

[80]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[81]  Hervé Delingette,et al.  Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow , 2018, Medical Image Anal..

[82]  Catherine E. Tucker,et al.  Algorithmic Bias? An Empirical Study of Apparent Gender-Based Discrimination in the Display of STEM Career Ads , 2019, Manag. Sci..

[83]  M. Marschollek,et al.  Clinical Decision-Support Systems for Detection of Systemic Inflammatory Response Syndrome, Sepsis, and Septic Shock in Critically Ill Patients: A Systematic Review , 2019, Methods of Information in Medicine.

[84]  Aleksandra Korolova,et al.  Auditing for Discrimination in Algorithms Delivering Job Ads , 2021, WWW.

[85]  Jure Leskovec,et al.  Interpretable & Explorable Approximations of Black Box Models , 2017, ArXiv.

[86]  Brice Mayag,et al.  Transparency of Classification Systems for Clinical Decision Support , 2020, IPMU.

[87]  Sasank Chilamkurthy,et al.  Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study , 2018, The Lancet.

[88]  Kartik Shankar,et al.  Online clinical decision support system using optimal deep neural networks , 2019, Appl. Soft Comput..

[89]  Jean-Baptiste Lamy,et al.  Rainbow boxes: A new technique for overlapping set visualization and two applications in the biomedical domain , 2017, J. Vis. Lang. Comput..

[90]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[91]  P. Ravaud,et al.  A clinical decision support system for prevention of venous thromboembolism: effect on physician behavior. , 2000, JAMA.

[92]  Massimo Midiri,et al.  A semi-automatic approach for epicardial adipose tissue segmentation and quantification on cardiac CT scans , 2019, Comput. Biol. Medicine.

[93]  Lyn M. van Swol,et al.  Out with the Humans, in with the Machines?: Investigating the Behavioral and Psychological Effects of Replacing Human Advisors with a Machine , 2021 .

[94]  Heung-Il Suk,et al.  Toward an interpretable Alzheimer’s disease diagnostic model with regional abnormality representation via deep learning , 2019, NeuroImage.

[95]  A Ibrahim,et al.  Radiomics for precision medicine: current challenges,future prospects, and the proposal of a new framework. , 2020, Methods.

[96]  Philippe Lambin,et al.  Decision Support Systems in Oncology , 2019, JCO clinical cancer informatics.

[97]  Francisco Palacios Ortega,et al.  Exploring Antimicrobial Resistance Prediction Using Post-hoc Interpretable Methods , 2019, KR4HC/ProHealth/TEAAM@AIME.

[98]  Lucas D. Introna, Helen Nissenbaum Shaping the Web: Why the Politics of Search Engines Matters , 2000 .

[99]  Bolei Zhou,et al.  Expert identification of visual primitives used by CNNs during mammogram classification , 2018, Medical Imaging.

[100]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[101]  Georg Langs,et al.  Causability and explainability of artificial intelligence in medicine , 2019, WIREs Data Mining Knowl. Discov..

[102]  Andreas Holzinger,et al.  Measuring the Quality of Explanations: The System Causability Scale (SCS) , 2020, KI - Künstliche Intelligenz.

[103]  Ghassan Hamarneh,et al.  Applying Artificial Intelligence to Glioma Imaging: Advances and Challenges , 2019, ArXiv.

[104]  Philippe Fortemps,et al.  PARS, a system combining semantic technologies with multiple criteria decision aiding for supporting antibiotic prescriptions , 2019, J. Biomed. Informatics.

[105]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[106]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[107]  A. M. Shahsavarani,et al.  Clinical Decision Support Systems (CDSSs): State of the art Review of Literature , 2015 .

[108]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[109]  Michael Gusenbauer,et al.  Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases , 2018, Scientometrics.