Towards a framework for evaluating the safety, acceptability and efficacy of AI systems for health: an initial synthesis

The potential presented by Artificial Intelligence (AI) for healthcare has long been recognised by the technical community. More recently, this potential has been recognised by policymakers, resulting in considerable public and private investment in the development of AI for healthcare across the globe. Despite this, excepting limited success stories, real-world implementation of AI systems into frontline healthcare has been limited. There are numerous reasons for this, but a main contributory factor is the lack of internationally accepted, or formalised, regulatory standards to assess AI safety and impact and effectiveness. This is a well-recognised problem with numerous ongoing research and policy projects to overcome it. Our intention here is to contribute to this problem-solving effort by seeking to set out a minimally viable framework for evaluating the safety, acceptability and efficacy of AI systems for healthcare. We do this by conducting a systematic search across Scopus, PubMed and Google Scholar to identify all the relevant literature published between January 1970 and November 2020 related to the evaluation of: output performance; efficacy; and real-world use of AI systems, and synthesising the key themes according to the stages of evaluation: pre-clinical (theoretical phase); exploratory phase; definitive phase; and post-market surveillance phase (monitoring). The result is a framework to guide AI system developers, policymakers, and regulators through a sufficient evaluation of an AI system designed for use in healthcare.

[1]  Trisha Greenhalgh,et al.  Spreading and scaling up innovation and improvement , 2019, BMJ.

[2]  Michael Gao,et al.  A Path for Translation of Machine Learning Products into Healthcare Delivery , 2020, EMJ Innovations.

[3]  Billy Amzal,et al.  Arti fi cial Intelligence and Machine Learning Applied at the Point of Care , 2022 .

[4]  Sonia Allan,et al.  A governance model for the application of AI in health care , 2019, J. Am. Medical Informatics Assoc..

[5]  Matthias Dehmer,et al.  A comprehensive survey of error measures for evaluating binary decision making in data science , 2019, WIREs Data Mining Knowl. Discov..

[6]  T. Greenhalgh,et al.  Why Do Evaluations of eHealth Programs Fail? An Alternative Set of Guiding Principles , 2010, PLoS medicine.

[7]  Hamed Asadi,et al.  Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods. , 2019, AJR. American journal of roentgenology.

[8]  Y Reisman,et al.  Computer-based clinical decision aids. A review of methods and assessment of systems. , 1996, Medical informatics = Medecine et informatique.

[9]  A. Darzi,et al.  Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: The STARD-AI Steering Group , 2020, Nature Medicine.

[10]  Samer Ellahham,et al.  Application of Artificial Intelligence in the Health Care Safety Context: Opportunities and Challenges , 2020, American journal of medical quality : the official journal of the American College of Medical Quality.

[11]  Carl Macrae,et al.  Governing the safety of artificial intelligence in healthcare , 2019, BMJ Quality & Safety.

[12]  Gary S Collins,et al.  Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension , 2020, BMJ.

[13]  Luciano Floridi,et al.  Tolerant Paternalism: Pro-ethical Design as a Resolution of the Dilemma of Toleration , 2016, Sci. Eng. Ethics.

[14]  T. Greenhalgh,et al.  Reframing evidence synthesis as rhetorical action in the policy making drama. , 2006, Healthcare policy = Politiques de sante.

[15]  Göran Petersson,et al.  Limited evidence of benefits of patient operated intelligent primary care triage tools: findings of a literature review , 2020, BMJ Health & Care Informatics.

[16]  M. V. van Velthoven,et al.  Mobile Apps for Health Behavior Change in Physical Activity, Diet, Drug and Alcohol Use, and Mental Health: Systematic Review , 2020, JMIR mHealth and uHealth.

[17]  D. Sackett,et al.  On the need for evidence-based medicine. , 1995, Journal of public health medicine.

[18]  Mariarosaria Taddeo,et al.  Ethics-Based Auditing of Automated Decision-Making Systems: Nature, Scope, and Limitations , 2021, Science and Engineering Ethics.

[19]  Gary S. Collins,et al.  Reporting of artificial intelligence prediction models , 2019, The Lancet.

[20]  Danica Marinac-Dabic,et al.  A Road Map for Translational Research on Artificial Intelligence in Medical Imaging: From the 2018 National Institutes of Health/RSNA/ACR/The Academy Workshop. , 2019, Journal of the American College of Radiology : JACR.

[21]  P. Sandercock,et al.  Framework for design and evaluation of complex interventions to improve health , 2000, BMJ : British Medical Journal.

[22]  C. May Towards a general theory of implementation , 2013, Implementation Science.

[23]  Trisha Greenhalgh,et al.  Beyond Adoption: A New Framework for Theorizing and Evaluating Nonadoption, Abandonment, and Challenges to the Scale-Up, Spread, and Sustainability of Health and Care Technologies , 2017, Journal of medical Internet research.

[24]  P L Miller,et al.  The evaluation of artificial intelligence systems in medicine. , 1985, Computer methods and programs in biomedicine.

[25]  D. Clifton,et al.  DECIDE-AI: new reporting guidelines to bridge the development-to-implementation gap in clinical artificial intelligence , 2021, Nature Medicine.

[26]  Francesca Rossi,et al.  AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations , 2018, Minds and Machines.

[27]  Mary Dixon-Woods,et al.  Synthesising qualitative and quantitative evidence: a review of possible methods. , 2005, Journal of health services research & policy.

[28]  Daniel L. Rubin,et al.  Regulatory Frameworks for Development and Evaluation of Artificial Intelligence–Based Diagnostic Imaging Algorithms: Summary and Recommendations , 2020, Journal of the American College of Radiology.

[29]  Peter Schröder-Bäck,et al.  Criteria for evaluating transferability of health interventions: a systematic review and thematic synthesis , 2018, Implementation Science.

[30]  Paulo J. G. Lisboa,et al.  A review of evidence of health benefit from artificial neural networks in medical intervention , 2002, Neural Networks.

[31]  Mariarosaria Taddeo,et al.  The ethics of algorithms: key problems and solutions , 2020, AI & SOCIETY.

[32]  P. Keane,et al.  Delivering personalized medicine in retinal care: from artificial intelligence algorithms to clinical application. , 2020, Current opinion in ophthalmology.

[33]  Matthew Krause,et al.  Meticulous Transparency - An Evaluation Process for an Agile AI Regulatory Scheme , 2018, IEA/AIE.

[34]  S. Park,et al.  Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction. , 2018, Radiology.

[35]  Luciano Floridi,et al.  From What to How: An Initial Review of Publicly Available AI Ethics Tools, Methods and Research to Translate Principles into Practices , 2019, Science and Engineering Ethics.

[36]  Weiguo Fan,et al.  Review of Medical Decision Support and Machine-Learning Methods , 2019, Veterinary pathology.

[37]  Phillip M Cheng,et al.  Artificial Intelligence for Medical Image Analysis: A Guide for Authors and Reviewers. , 2019, AJR. American journal of roentgenology.

[38]  S. Aoki,et al.  Variability and Standardization of Quantitative Imaging , 2020, Investigative radiology.

[39]  R. Alford The Craft of Inquiry: Theories, Methods, Evidence , 1998 .

[40]  R S LEDLEY,et al.  Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. , 1959, Science.

[41]  Mei Chen,et al.  Artificial intelligence in healthcare: An essential guide for health leaders , 2019, Healthcare management forum.

[42]  L. Floridi,et al.  The ethics of AI in health care: A mapping review. , 2020, Social science & medicine.

[43]  Nagomi Ota,et al.  A Concept for a Japanese Regulatory Framework for Emerging Medical Devices with Frequently Modified Behavior , 2020, Clinical and translational science.

[44]  Mariarosaria Taddeo,et al.  Ethical guidelines for COVID-19 tracing apps , 2020, Nature.

[45]  Avishek Choudhury,et al.  Role of Artificial Intelligence in Patient Safety Outcomes: Systematic Literature Review , 2020, JMIR medical informatics.

[46]  Farah Magrabi,et al.  Artificial Intelligence in Clinical Decision Support: Challenges for Evaluating AI and Practical Implications , 2019, Yearbook of Medical Informatics.

[47]  Neel S Madhukar,et al.  The Missing Pieces of Artificial Intelligence in Medicine. , 2019, Trends in pharmacological sciences.

[48]  Ted Shortliffe,et al.  Some considerations for the implementation of knowledge-based expert systems , 1975, SGAR.

[49]  Casper J. P. Zhang,et al.  Artificial Intelligence Versus Clinicians in Disease Diagnosis: Systematic Review , 2019, JMIR medical informatics.

[50]  Ben Goldacre,et al.  Barriers to Working With National Health Service England’s Open Data , 2019, Journal of medical Internet research.

[51]  Trisha Greenhalgh,et al.  Analysing the role of complexity in explaining the fortunes of technology programmes: empirical application of the NASSS framework , 2018, BMC Medicine.

[52]  B. Rao,et al.  Artificial intelligence for clinical decision support. , 2018, Cutis.

[53]  David Ben-Israel,et al.  The impact of machine learning on patient care: A systematic review , 2020, Artif. Intell. Medicine.

[54]  J. Denny,et al.  Artificial intelligence, bias and clinical safety , 2019, BMJ Quality & Safety.

[55]  T. Greenhalgh,et al.  Evidence based medicine: a movement in crisis? , 2014, BMJ : British Medical Journal.

[56]  C. Hassan,et al.  Artificial intelligence technologies for the detection of colorectal lesions: The future is now , 2020, World journal of gastroenterology.

[57]  Panos E Vardas,et al.  Novel pacing algorithms: do they represent a beneficial proposition for patients, physicians, and the health care system? , 2009, Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology.

[58]  Bibb Allen The Role of the FDA in Ensuring the Safety and Efficacy of Artificial Intelligence Software and Devices. , 2019, Journal of the American College of Radiology : JACR.