A Comparison of Artificial Intelligence and Human Doctors for the Purpose of Triage and Diagnosis

AI virtual assistants have significant potential to alleviate the pressure on overly burdened healthcare systems by enabling patients to self-assess their symptoms and to seek further care when appropriate. For these systems to make a meaningful contribution to healthcare globally, they must be trusted by patients and healthcare professionals alike, and service the needs of patients in diverse regions and segments of the population. We developed an AI virtual assistant which provides patients with triage and diagnostic information. Crucially, the system is based on a generative model, which allows for relatively straightforward re-parameterization to reflect local disease and risk factor burden in diverse regions and population segments. This is an appealing property, particularly when considering the potential of AI systems to improve the provision of healthcare on a global scale in many regions and for both developing and developed countries. We performed a prospective validation study of the accuracy and safety of the AI system and human doctors. Importantly, we assessed the accuracy and safety of both the AI and human doctors independently against identical clinical cases and, unlike previous studies, also accounted for the information gathering process of both agents. Overall, we found that the AI system is able to provide patients with triage and diagnostic information with a level of clinical accuracy and safety comparable to that of human doctors. Through this approach and study, we hope to start building trust in AI-powered systems by directly comparing their performance to human doctors, who do not always agree with each other on the cause of patients’ symptoms or the most appropriate triage recommendation.

[1]  Jonathan Guo,et al.  The Application of Medical Artificial Intelligence Technology in Rural Areas of Developing Countries , 2018, Health equity.

[2]  Enrico Coiera,et al.  Safety of patient-facing digital symptom checkers , 2018, The Lancet.

[3]  Thomas Wiegand,et al.  WHO and ITU establish benchmarking process for artificial intelligence in health , 2019, The Lancet.

[4]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[5]  A. O’Cathain,et al.  NHS Direct: consistency of triage outcomes , 2003, Emergency medicine journal : EMJ.

[6]  S. Nundy,et al.  Comparison of Physician and Computer Diagnostic Accuracy. , 2016, JAMA internal medicine.

[7]  M. Graber The incidence of diagnostic error in medicine , 2013, BMJ quality & safety.

[9]  Hardeep Singh,et al.  Beyond Dr. Google: the evidence on consumer-facing digital tools for diagnosis , 2018, Diagnosis.

[10]  Linda Nordling,et al.  A fairer way forward for AI in health care , 2019, Nature.

[11]  Reed M. Gardner,et al.  White Paper: Designing Medical Informatics Research and Library-Resource Projects to Increase What Is Learned , 1994, J. Am. Medical Informatics Assoc..

[12]  David Heckerman,et al.  A Tractable Inference Algorithm for Diagnosing Multiple Diseases , 2013, UAI.

[13]  J. France,et al.  Potential of Mobile Health Technology to Reduce Health Disparities in Underserved Communities , 2019, The western journal of emergency medicine.

[14]  Andrew Booth,et al.  Digital and online symptom checkers and health assessment/triage services for urgent health problems: systematic review , 2019, BMJ Open.

[15]  Yura N. Perov,et al.  A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis , 2018, ArXiv.

[16]  Richard E. Turner,et al.  Neural Adaptive Sequential Monte Carlo , 2015, NIPS.

[17]  Jian Cheng,et al.  AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks , 2000, J. Artif. Intell. Res..

[18]  Yura N. Perov,et al.  A Universal Marginalizer for Amortized Inference in Generative Models , 2017, ArXiv.

[19]  C. Gidengil,et al.  Evaluation of symptom checkers for self diagnosis and triage: audit study , 2015, BMJ : British Medical Journal.

[20]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[21]  V. Tsang,et al.  Triage accuracy of online symptom checkers for Accident and Emergency Department patients , 2020, Hong Kong Journal of Emergency Medicine.

[22]  M. Mulekar,et al.  Symptom checkers versus doctors: A prospective, head‐to‐head comparison for cough , 2019, The clinical respiratory journal.

[23]  Derek W. Meeks,et al.  Physicians' diagnostic accuracy, confidence, and resource requests: a vignette study. , 2013, JAMA internal medicine.

[24]  Nicolette de Keizer,et al.  STARE-HI -Statement on Reporting of Evaluation Studies in Health Informatics , 2009, Yearbook of Medical Informatics.

[25]  Stefan Germann,et al.  Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? , 2018, BMJ Global Health.

[26]  Elizabeth Murray,et al.  Evaluating Digital Health Interventions: Key Questions and Approaches. , 2016, American journal of preventive medicine.

[27]  Ali Parsa,et al.  Sorting out symptoms: design and evaluation of the 'babylon check' automated triage system , 2016, ArXiv.

[28]  Deborah Lupton,et al.  Digitizing diagnosis: a review of mobile applications in the diagnostic process , 2015, Diagnosis.