Artificial Intelligence in Clinical Decision Support: Challenges for Evaluating AI and Practical Implications

Summary Objectives : This paper draws attention to: i) key considerations for evaluating artificial intelligence (AI) enabled clinical decision support; and ii) challenges and practical implications of AI design, development, selection, use, and ongoing surveillance. Method : A narrative review of existing research and evaluation approaches along with expert perspectives drawn from the International Medical Informatics Association (IMIA) Working Group on Technology Assessment and Quality Development in Health Informatics and the European Federation for Medical Informatics (EFMI) Working Group for Assessment of Health Information Systems. Results : There is a rich history and tradition of evaluating AI in healthcare. While evaluators can learn from past efforts, and build on best practice evaluation frameworks and methodologies, questions remain about how to evaluate the safety and effectiveness of AI that dynamically harness vast amounts of genomic, biomarker, phenotype, electronic record, and care delivery data from across health systems. This paper first provides a historical perspective about the evaluation of AI in healthcare. It then examines key challenges of evaluating AI-enabled clinical decision support during design, development, selection, use, and ongoing surveillance. Practical aspects of evaluating AI in healthcare, including approaches to evaluation and indicators to monitor AI are also discussed. Conclusion : Commitment to rigorous initial and ongoing evaluation will be critical to ensuring the safe and effective integration of AI in complex sociotechnical settings. Specific enhancements that are required for the new generation of AI-enabled clinical decision support will emerge through practical application.

[1]  Peter Schröder-Bäck,et al.  Criteria for evaluating transferability of health interventions: a systematic review and thematic synthesis , 2018, Implementation Science.

[2]  C A Kulikowski,et al.  An Opening Chapter of the First Generation of Artificial Intelligence in Medicine: The First Rutgers AIM Workshop, June 1975 , 2015, Yearbook of Medical Informatics.

[3]  N. Shaw,et al.  Bad Health Informatics Can Kill – Is Evaluation the Answer? , 2005, Methods of Information in Medicine.

[4]  Aziz Sheikh,et al.  A Qualitative Exploration of Workarounds Related to the Implementation of National Electronic Health Records in Early Adopter Mental Health Hospitals , 2014, PloS one.

[5]  Philip C. Treleaven,et al.  Algorithmic trading review , 2013, CACM.

[6]  Adam Steventon,et al.  What have we learnt after 15 years of research into the ‘weekend effect’? , 2016, BMJ Quality & Safety.

[7]  A. C. Scott,et al.  Evaluating the performance of a computer-based consultant. , 1979, Computer programs in biomedicine.

[8]  Farah Magrabi,et al.  Problems with health information technology and their effects on care delivery and patient outcomes: a systematic review , 2017, J. Am. Medical Informatics Assoc..

[9]  Nicolette de Keizer,et al.  Guideline for good evaluation practice in health informatics (GEP-HI) , 2011, Int. J. Medical Informatics.

[10]  Julia Adler-Milstein,et al.  Benchmarking health IT among OECD countries: better data for better policy , 2014, J. Am. Medical Informatics Assoc..

[11]  Mona Choi,et al.  Handbook of Evaluation Methods for Health Informatics , 2011, Healthcare Informatics Research.

[12]  D. Spiegelhalter,et al.  Evaluating medical expert systems: what to test and how? , 1990, Medical informatics = Medecine et informatique.

[13]  O Wigertz,et al.  Evaluation of decision support systems in medicine. , 1991, Computer methods and programs in biomedicine.

[14]  Nicolette de Keizer,et al.  STARE-HI -Statement on Reporting of Evaluation Studies in Health Informatics , 2009, Yearbook of Medical Informatics.

[15]  A R Bakker,et al.  ATIM, accompanying measure on the assessment of information technology in medicine. , 1994, Computer methods and programs in biomedicine.

[16]  Nancy M. Lorenzi,et al.  Results of discussions at the IMIA WG 13 and 15 working conference , 1999, Int. J. Medical Informatics.

[17]  Stephen M. Downs,et al.  Desiderata for sharable computable biomedical knowledge for learning health systems , 2018, Learning health systems.

[18]  Rajeev Agrawal,et al.  Open Source Platforms and Frameworks for Artificial Intelligence and Machine Learning , 2018, SoutheastCon 2018.

[19]  G. Sutton Computer aided diagnosis of acute abdominal pain , 1986, British medical journal.

[20]  Simon DeDeo,et al.  "Wrong side of the tracks": Big Data and Protected Categories , 2014, ArXiv.

[21]  John McCarthy,et al.  Recursive functions of symbolic expressions and their computation by machine, Part I , 1960, Commun. ACM.

[22]  D. G. Swain Computer aided diagnosis of acute abdominal pain , 1986 .

[23]  E. Coiera Artificial intelligence in medicine: the challenges ahead. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[24]  F. Lupiáñez-Villanueva,et al.  Benchmarking Deployment of Ehealth Among General Practitioners , 2018 .

[25]  I. Kohane,et al.  Framing the challenges of artificial intelligence in medicine , 2018, BMJ Quality & Safety.

[26]  Edward H. Shortliffe,et al.  The adolescence of AI in medicine: will the field come of age in the '90s? , 1993, Artif. Intell. Medicine.

[27]  Ameen Abu-Hanna,et al.  Effect of changes over time in the performance of a customized SAPS-II model on the quality of care assessment , 2013 .

[28]  Ephraim R. McLean,et al.  The DeLone and McLean Model of Information Systems Success: A Ten-Year Update , 2003, J. Manag. Inf. Syst..

[29]  A. Mehrotra,et al.  Evaluation of Artificial Intelligence–Based Grading of Diabetic Retinopathy in Primary Care , 2018, JAMA network open.

[30]  C A Kulikowski,et al.  An Architecture for Knowledge-based Construction of Decision Models , 1994, Medical decision making : an international journal of the Society for Medical Decision Making.

[31]  Ephraim R. McLean,et al.  Information Systems Success: The Quest for the Dependent Variable , 1992, Inf. Syst. Res..

[32]  Peter Szolovits,et al.  The coming of age of artificial intelligence in medicine , 2009, Artif. Intell. Medicine.

[33]  Philippe Roussel,et al.  The birth of Prolog , 1993, HOPL-II.

[34]  Jytte Brender,et al.  Methodology for constructive assessment of IT-based systems in an organisational context , 1999, Int. J. Medical Informatics.

[35]  Christian Nøhr,et al.  Nordic eHealth Indicators: Organisation of Research, First Results and Plan for the Future , 2013, MedInfo.

[36]  Karrie Karahalios,et al.  Auditing Algorithms : Research Methods for Detecting Discrimination on Internet Platforms , 2014 .

[37]  J. Denny,et al.  Artificial intelligence, bias and clinical safety , 2019, BMJ Quality & Safety.

[38]  Jane Grimson,et al.  A methodology for evaluation of knowledge-based systems in medicine , 1994, Artif. Intell. Medicine.

[39]  Francisco Lupiáñez-Villanueva,et al.  Benchmarking Deployment of Ehealth Among General Practitioners , 2018 .

[40]  Isaac S Kohane,et al.  Artificial Intelligence in Healthcare , 2019, Artificial Intelligence and Machine Learning for Business for Non-Engineers.

[41]  Farah Magrabi,et al.  Reduced Verification of Medication Alerts Increases Prescribing Errors , 2019, Applied Clinical Informatics.

[42]  Charles P. Friedman,et al.  Conceptualising and creating a global learning health system , 2013, Int. J. Medical Informatics.

[43]  J Brender,et al.  Factors influencing the transferability of medical decision support systems. , 1991, International journal of bio-medical computing.

[44]  William Welser,et al.  An Intelligence in Our Image: The Risks of Bias and Errors in Artificial Intelligence , 2017 .

[45]  J Brender,et al.  STARE-HI – Statement on Reporting of Evaluation Studies in Health Informatics , 2013, Applied Clinical Informatics.

[46]  Joseph Savirimuthu,et al.  The GDPR, AI and the NHS Code of Conduct for Data-Driven Health and Care Technology , 2020 .

[47]  Bryony Dean Franklin,et al.  How reliable are clinical systems in the UK NHS? A study of seven NHS organisations , 2012, BMJ quality & safety.

[48]  J Brender,et al.  User-requirements driven learning. , 1995, Computer methods and programs in biomedicine.

[49]  Enrico Coiera,et al.  The fate of medicine in the time of AI , 2018, The Lancet.

[50]  I. Kohane,et al.  Biases in electronic health record data due to processes within the healthcare system: retrospective observational study , 2018, British Medical Journal.

[51]  Farah Magrabi,et al.  Automation bias in electronic prescribing , 2017, BMC Medical Informatics and Decision Making.

[52]  S. G. Axline,et al.  Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system. , 1975, Computers and biomedical research, an international journal.

[53]  Jan L. Talmon,et al.  Inventory of validation approaches in selected health telematics projects , 1999, Int. J. Medical Informatics.