Deploying clinical machine learning? Consider the following

Despite the intense attention and considerable investment into clinical machine learning research, relatively few applications have been deployed at a large-scale in a real-world clinical environment. While research is important in advancing the state-of-the-art, translation is equally important in bringing these techniques and technologies into a position to ultimately impact healthcare. We believe a lack of appreciation for several considerations are a major cause for this discrepancy between expectation and reality. To better characterize a holistic perspective among researchers and practitioners, we survey several practitioners with commercial experience in developing CML for clinical deployment. Using these insights, we identify several main categories of challenges in order to better design and develop clinical machine learning

[1]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[2]  Steve Halligan,et al.  Multi-Reader Multi-Case Studies Using the Area under the Receiver Operator Characteristic Curve as a Measure of Diagnostic Accuracy: Systematic Review with a Focus on Quality of Data Reporting , 2014, PloS one.

[3]  Eliza Strickland,et al.  IBM Watson, heal thyself: How IBM overpromised and underdelivered on AI health care , 2019, IEEE Spectrum.

[4]  S. Saria,et al.  The Clinician and Dataset Shift in Artificial Intelligence. , 2021, The New England journal of medicine.

[5]  Ryanne A. Brown,et al.  Impact of a deep learning assistant on the histopathologic classification of liver cancer , 2020, npj Digital Medicine.

[6]  M. Abràmoff,et al.  Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. , 2016, Investigative ophthalmology & visual science.

[7]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[8]  Noel C. F. Codella,et al.  Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC) , 2019, ArXiv.

[9]  W. Stead Clinical Implications and Challenges of Artificial Intelligence and Deep Learning. , 2018, JAMA.

[10]  Feng Tian,et al.  “Brilliant AI Doctor” in Rural Clinics: Challenges in AI-Powered Clinical Decision Support System Deployment , 2021, CHI.

[11]  Ben Green,et al.  The Myth in the Methodology: Towards a Recontextualization of Fairness in Machine Learning , 2018, ICML 2018.

[12]  E. Topol,et al.  A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. , 2019, The Lancet. Digital health.

[13]  Saptarshi Purkayastha,et al.  AI recognition of patient race in medical imaging: a modelling study , 2021, The Lancet. Digital health.

[14]  Aaron S. Coyner,et al.  Deep Learning for the Diagnosis of Stage in Retinopathy of Prematurity: Accuracy and Generalizability across Populations and Cameras. , 2021, Ophthalmology. Retina.

[15]  S. Taylor-Phillips,et al.  Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy , 2021, BMJ.

[16]  M. Ghassemi,et al.  Do as AI say: susceptibility in deployment of clinical decision-aids , 2021, npj Digital Medicine.

[17]  Jayashree Kalpathy-Cramer,et al.  Fair Conformal Predictors for Applications in Medical Imaging , 2021, AAAI.

[18]  Raymond Y Huang,et al.  Artificial intelligence in cancer imaging: Clinical challenges and applications , 2019, CA: a cancer journal for clinicians.

[19]  Jan Baumbach,et al.  Success Factors of Artificial Intelligence Implementation in Healthcare , 2021, Frontiers in Digital Health.

[20]  Qian Yang,et al.  Re-examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design , 2020, CHI.

[21]  Andrew Smart,et al.  Extending the Machine Learning Abstraction Boundary: A Complex Systems Approach to Incorporate Societal Context , 2020, ArXiv.

[22]  S. Wright,et al.  Morbidity and mortality conference, grand rounds, and the ACGME’s core competencies , 2006, Journal of General Internal Medicine.

[23]  Vandana V. Mukherjee,et al.  On the role of artificial intelligence in medical imaging of COVID-19 , 2020, Patterns.

[24]  Oleg S. Pianykh,et al.  Continuous Learning AI in Radiology: Implementation Principles and Early Applications. , 2020, Radiology.

[25]  Hanna M. Wallach,et al.  Measurement and Fairness , 2019, FAccT.

[26]  Stephan Günnemann,et al.  Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.

[27]  F Schwendicke,et al.  Artificial Intelligence in Dentistry: Chances and Challenges , 2020, Journal of dental research.

[28]  Diego H. Milone,et al.  Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis , 2020, Proceedings of the National Academy of Sciences.

[29]  A. Yeow,et al.  Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare , 2021, Nature Communications.

[30]  Marzyeh Ghassemi,et al.  An empirical framework for domain generalization in clinical settings , 2021, CHIL.

[31]  Le Lu,et al.  DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning , 2018, Journal of medical imaging.

[32]  Daniel L. Rubin,et al.  Addressing catastrophic forgetting for medical domain expansion , 2021, ArXiv.

[33]  R. Ranganath,et al.  Opportunities in Machine Learning for Healthcare , 2018, ArXiv.

[34]  清也 稲邑,et al.  DICOM Structured Reporting構造化報告書 , 2001 .

[35]  J. Ioannidis,et al.  Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies , 2020, BMJ.

[36]  Suproteem K. Sarkar,et al.  ML4H Auditing: From Paper to Practice , 2020, ML4H@NeurIPS.

[37]  David C. Kale,et al.  Do no harm: a roadmap for responsible machine learning for health care , 2019, Nature Medicine.

[38]  Xiang 'Anthony' Chen,et al.  CheXplain: Enabling Physicians to Explore and Understand Data-Driven, AI-Enabled Medical Imaging Analysis , 2020, CHI.

[39]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[40]  Ken Chang,et al.  Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density. , 2020, Journal of the American College of Radiology : JACR.

[41]  J. Garrett,et al.  Diagnostic Accuracy and Failure Mode Analysis of a Deep Learning Algorithm for the Detection of Cervical Spine Fractures , 2021, American Journal of Neuroradiology.

[42]  E. Pierson,et al.  An algorithmic approach to reducing unexplained pain disparities in underserved populations , 2021, Nature Medicine.

[43]  Jie Xu,et al.  The practical implementation of artificial intelligence technologies in medicine , 2019, Nature Medicine.

[44]  Jayashree Kalpathy-Cramer,et al.  Evaluating subgroup disparity using epistemic uncertainty in mammography , 2021, ArXiv.

[45]  Mustafa Suleyman,et al.  Key challenges for delivering clinical impact with artificial intelligence , 2019, BMC Medicine.

[46]  Krzysztof Z. Gajos,et al.  Designing AI for Trust and Collaboration in Time-Constrained Medical Decisions: A Sociotechnical Lens , 2021, CHI.

[47]  Izet Masic,et al.  Evidence Based Medicine – New Approaches and Challenges , 2008, Acta informatica medica : AIM : journal of the Society for Medical Informatics of Bosnia & Herzegovina : casopis Drustva za medicinsku informatiku BiH.

[48]  Romane Gauriau,et al.  An Overview and Case Study of the Clinical AI Model Development Life Cycle for Healthcare Systems , 2020, ArXiv.

[49]  Martin Wattenberg,et al.  Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making , 2019, CHI.

[50]  Christian Etmann,et al.  Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans , 2020 .

[51]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[52]  Daniel E. Ho,et al.  How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals , 2021, Nature Medicine.