Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension

Abstract The CONSORT 2010 (Consolidated Standards of Reporting Trials) statement provides minimum guidelines for reporting randomised trials. Its widespread use has been instrumental in ensuring transparency when evaluating new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate impact on health outcomes. The CONSORT-AI extension is a new reporting guideline for clinical trials evaluating interventions with an AI component. It was developed in parallel with its companion statement for clinical trial protocols: SPIRIT-AI. Both guidelines were developed through a staged consensus process, involving a literature review and expert consultation to generate 29 candidate items, which were assessed by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed on in a two-day consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The CONSORT-AI extension includes 14 new items, which were considered sufficiently important for AI interventions, that they should be routinely reported in addition to the core CONSORT 2010 items. CONSORT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention, the human-AI interaction and providing analysis of error cases. CONSORT-AI will help promote transparency and completeness in reporting clinical trials for AI interventions. It will assist editors and peer-reviewers, as well as the general readership, to understand, interpret and critically appraise the quality of clinical trial design and risk of bias in the reported outcomes.

[1]  David Moher,et al.  Reporting of patient-reported outcomes in randomized trials: the CONSORT PRO extension. , 2013, JAMA.

[2]  W. Zhou,et al.  Detection of colorectal adenomas with a real-time computer-aided system (ENDOANGEL): a randomised controlled study. , 2020, The lancet. Gastroenterology & hepatology.

[3]  Marzyeh Ghassemi,et al.  Treating health disparities with artificial intelligence , 2020, Nature Medicine.

[4]  M. Abràmoff,et al.  Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. , 2016, Investigative ophthalmology & visual science.

[5]  T. Berzin,et al.  Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study , 2019, Gut.

[6]  David Moher,et al.  CONSORT Statement for Randomized Trials of Nonpharmacologic Treatments: A 2017 Update and a CONSORT Extension for Nonpharmacologic Trial Abstracts. , 2017, Annals of internal medicine.

[7]  D. Moher,et al.  CONSORT Extension for Chinese Herbal Medicine Formulas 2017: Recommendations, Explanation, and Elaboration. , 2017, Annals of internal medicine.

[8]  James Zou,et al.  AI can be sexist and racist — it’s time to make it fair , 2018, Nature.

[9]  Douglas G Altman,et al.  Systematic reviews in health care: Assessing the quality of controlled clinical trials. , 2001, BMJ.

[10]  A. Ng,et al.  Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists , 2018, PLoS medicine.

[11]  Gema García-Sáez,et al.  A web-based clinical decision support system for gestational diabetes: Automatic diet prescription and detection of insulin needs , 2017, Int. J. Medical Informatics.

[12]  E. Topol,et al.  A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. , 2019, The Lancet. Digital health.

[13]  J. Park,et al.  Revised STandards for Reporting Interventions in Clinical Trials of Acupuncture(STRICTA) : Extending the CONSORT Statement , 2010 .

[14]  M. P. Mulder,et al.  Effect of a Machine Learning-Derived Early Warning System for Intraoperative Hypotension vs Standard Care on Depth and Duration of Intraoperative Hypotension During Elective Noncardiac Surgery: The HYPE Randomized Clinical Trial. , 2020, JAMA.

[15]  J. Chong,et al.  Top 10 Reviewer Critiques of Radiology Artificial Intelligence (AI) Articles: Qualitative Thematic Analysis of Reviewer Critiques of Machine Learning/Deep Learning Manuscripts Submitted to JMRI , 2020, Journal of magnetic resonance imaging : JMRI.

[16]  Jared A. Dunnmon,et al.  Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.

[17]  Douglas Heaven,et al.  Why deep-learning AIs are so easy to fool , 2019, Nature.

[18]  D. Baumgart,et al.  An overview of clinical decision support systems: benefits, risks, and strategies for success. , 2020, NPJ digital medicine.

[19]  Carl F. Sabottke,et al.  The Effect of Image Resolution on Deep Learning in Radiography. , 2020, Radiology. Artificial intelligence.

[20]  Peixi Liu,et al.  Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study. , 2020, The lancet. Gastroenterology & hepatology.

[21]  Wei Zhou,et al.  Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy , 2019, Gut.

[22]  Wade W. Hilts,et al.  An artificial intelligence decision support system for the management of type 1 diabetes , 2020, Nature metabolism.

[23]  Livia Faes,et al.  Extension of the CONSORT and SPIRIT statements , 2019, The Lancet.

[24]  R. J. Hayes,et al.  Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. , 1995, JAMA.

[25]  Shahar Azulay,et al.  Assessment of a Personalized Approach to Predicting Postprandial Glycemic Responses to Food Among Individuals Without Diabetes. , 2019, JAMA network open.

[26]  T. Wachs,et al.  CONSORT for reporting randomised trials in journal and conference abstracts , 2008 .

[27]  Ryanne A. Brown,et al.  Impact of a deep learning assistant on the histopathologic classification of liver cancer. , 2020, NPJ digital medicine.

[28]  R Peto,et al.  Large-scale randomized evidence: large, simple trials and overviews of trials. , 1993, Annals of the New York Academy of Sciences.

[29]  N. Black CONSORT , 1996, The Lancet.

[30]  D. Moher,et al.  Guidance for Developers of Health Research Reporting Guidelines , 2010, PLoS medicine.

[31]  Xiaohang Wu,et al.  Diagnostic Efficacy and Therapeutic Decision-making Capacity of an Artificial Intelligence Platform for Childhood Cataracts in Eye Clinics: A Multicentre Randomized Controlled Trial , 2019, EClinicalMedicine.

[32]  CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials , 2011, BMJ : British Medical Journal.

[33]  Nicolette de Keizer,et al.  STARE-HI -Statement on Reporting of Evaluation Studies in Health Informatics , 2009, Yearbook of Medical Informatics.

[34]  Ananth Ravi,et al.  Evaluation of a Machine-Learning Algorithm for Treatment Planning in Prostate Low-Dose-Rate Brachytherapy. , 2017, International journal of radiation oncology, biology, physics.

[35]  Ibrahim Habli,et al.  Artificial intelligence in health care: accountability and safety , 2020, Bulletin of the World Health Organization.

[36]  B. Sibbald,et al.  Understanding controlled trials: Why are randomised controlled trials important? , 1998 .

[37]  J. Goo,et al.  Preoperative CT-based Deep Learning Model for Predicting Disease-Free Survival in Patients with Lung Adenocarcinomas. , 2020, Radiology.

[38]  H. Hemilä Citation bias in the CONSORT comments on blinding , 2010 .

[39]  J. Ioannidis,et al.  Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies , 2020, BMJ.

[40]  David C. Kale,et al.  Do no harm: a roadmap for responsible machine learning for health care , 2019, Nature Medicine.

[41]  Tae Won Benjamin Kim,et al.  Internet-Based Exercise Therapy Using Algorithms for Conservative Treatment of Anterior Knee Pain: A Pragmatic Randomized Controlled Trial , 2016, JMIR rehabilitation and assistive technologies.

[42]  Andrew L. Beam,et al.  Adversarial attacks on medical machine learning , 2019, Science.

[43]  David S. Melnick,et al.  International evaluation of an AI system for breast cancer screening , 2020, Nature.

[44]  A. Darzi,et al.  Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: The STARD-AI Steering Group , 2020, Nature Medicine.

[45]  D. Moher,et al.  Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. , 2001, JAMA.

[46]  Geraint Rees,et al.  Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[47]  Mark Hoogendoorn,et al.  Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy , 2020, Intensive Care Medicine.

[48]  D. Moher,et al.  Standards for Reporting Interventions in Clinical Trials of cupuncture ( STRICTA ) : Extending the CONSORT statement , 2016 .

[49]  Aaron Y. Lee,et al.  Clinical applications of continual learning machine learning. , 2020, The Lancet. Digital health.

[50]  A. Adamson,et al.  Machine Learning and Health Care Disparities in Dermatology. , 2018, JAMA dermatology.

[51]  David Moher,et al.  Reporting Randomized, Controlled Trials of Herbal Interventions: An Elaborated CONSORT Statement , 2006, Annals of Internal Medicine.

[52]  Laura Shafner,et al.  Using Artificial Intelligence to Reduce the Risk of Nonadherence in Patients on Anticoagulation Therapy , 2017, Stroke.

[53]  Jie Xu,et al.  The practical implementation of artificial intelligence technologies in medicine , 2019, Nature Medicine.

[54]  David Moher,et al.  Reducing waste from incomplete or unusable reports of biomedical research , 2014, The Lancet.

[55]  Jin-Young Choi,et al.  Development and Validation of a Deep Learning System for Staging Liver Fibrosis by Using Contrast Agent-enhanced CT Images in the Liver. , 2018, Radiology.

[56]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[57]  Gary S. Collins,et al.  Reporting of artificial intelligence prediction models , 2019, The Lancet.

[58]  D. Hassabis,et al.  Predicting conversion to wet age-related macular degeneration using deep learning , 2020, Nature Medicine.

[59]  David Moher,et al.  Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed , 2019, Nature Medicine.

[60]  Xiu-Li Zuo,et al.  Impact of real-time automatic quality control system on colorectal polyp and adenoma detection: a prospective randomized controlled study (with video). , 2020, Gastrointestinal endoscopy.

[61]  Mustafa Suleyman,et al.  Key challenges for delivering clinical impact with artificial intelligence , 2019, BMC Medicine.

[62]  David Moher,et al.  Revised STandards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA): Extending the CONSORT Statement , 2010, PLoS medicine.

[63]  Peter Washington,et al.  Effect of Wearable Digital Intervention for Improving Socialization in Children With Autism Spectrum Disorder: A Randomized Clinical Trial , 2019, JAMA pediatrics.

[64]  Marcus A. Badgeley,et al.  Confounding variables can degrade generalization performance of radiological deep learning models , 2018, ArXiv.