Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study.

Background The interpretation of radiographs suffers from an ever-increasing workload in emergency and radiology departments, while missed fractures represent up to 80% of diagnostic errors in the emergency department. Purpose To assess the performance of an artificial intelligence (AI) system designed to aid radiologists and emergency physicians in the detection and localization of appendicular skeletal fractures. Materials and Methods The AI system was previously trained on 60 170 radiographs obtained in patients with trauma. The radiographs were randomly split into 70% training, 10% validation, and 20% test sets. Between 2016 and 2018, 600 adult patients in whom multiview radiographs had been obtained after a recent trauma, with or without one or more fractures of shoulder, arm, hand, pelvis, leg, and foot, were retrospectively included from 17 French medical centers. Radiographs with quality precluding human interpretation or containing only obvious fractures were excluded. Six radiologists and six emergency physicians were asked to detect and localize fractures with (n = 300) and fractures without (n = 300) the aid of software highlighting boxes around AI-detected fractures. Aided and unaided sensitivity, specificity, and reading times were compared by means of paired Student t tests after averaging of performances of each reader. Results A total of 600 patients (mean age ± standard deviation, 57 years ± 22; 358 women) were included. The AI aid improved the sensitivity of physicians by 8.7% (95% CI: 3.1, 14.2; P = .003 for superiority) and the specificity by 4.1% (95% CI: 0.5, 7.7; P < .001 for noninferiority) and reduced the average number of false-positive fractures per patient by 41.9% (95% CI: 12.8, 61.3; P = .02) in patients without fractures and the mean reading time by 15.0% (95% CI: -30.4, 3.8; P = .12). Finally, stand-alone performance of a newer release of the AI system was greater than that of all unaided readers, including skeletal expert radiologists, with an area under the receiver operating characteristic curve of 0.94 (95% CI: 0.92, 0.96). Conclusion The artificial intelligence aid provided a gain of sensitivity (8.7% increase) and specificity (4.1% increase) without loss of reading speed. © RSNA, 2021 Online supplemental material is available for this article.

[1]  Thomas Frauenfelder,et al.  Detection and localization of distal radius fractures: Deep learning system versus radiologists. , 2020, European journal of radiology.

[2]  David S. Melnick,et al.  International evaluation of an AI system for breast cancer screening , 2020, Nature.

[3]  Kathryn J Fowler,et al.  Assessing Radiology Research on Artificial Intelligence: A Brief Guide for Authors, Reviewers, and Readers-From the Radiology Editorial Board. , 2019, Radiology.

[4]  M. Tranovich,et al.  Radiograph Interpretation Discrepancies in a Community Hospital Emergency Department , 2019, The western journal of emergency medicine.

[5]  Andrew Strong,et al.  Cognitive Biases in Emergency Physicians: A Pilot Study. , 2019, The Journal of emergency medicine.

[6]  Gene Kitamura,et al.  Ankle Fracture Detection Utilizing a Convolutional Neural Network Ensemble Implemented with a Small Sample, De Novo Training, and Multiview Incorporation , 2019, Journal of Digital Imaging.

[7]  Chien-Hung Liao,et al.  Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs , 2019, European Radiology.

[8]  Frank Gaillard,et al.  Computer vs human: Deep learning versus perceptual training for the detection of neck of femur fractures , 2019, Journal of medical imaging and radiation oncology.

[9]  Marcus A. Badgeley,et al.  Deep learning predicts hip fracture using confounding patient and healthcare variables , 2018, npj Digital Medicine.

[10]  Anurag Gupta,et al.  Deep neural network improves fracture detection by clinicians , 2018, Proceedings of the National Academy of Sciences.

[11]  J. Willett Imaging in trauma in limited-resource settings: A literature review , 2018, African journal of emergency medicine : Revue africaine de la medecine d'urgence.

[12]  Seok Won Chung,et al.  Automated detection and classification of the proximal humerus fracture by using deep learning algorithm , 2018, Acta orthopaedica.

[13]  D. H. Kim,et al.  Artificial intelligence in fracture detection: transfer learning from deep convolutional neural networks. , 2017, Clinical radiology.

[14]  S. Wall,et al.  The Epidemiology of Emergency Department Trauma Discharges in the United States , 2017, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[15]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[16]  A. Brady Error and discrepancy in radiology: inevitable or avoidable? , 2016, Insights into Imaging.

[17]  E. Halpern,et al.  Diagnostic emergency imaging utilization at an academic trauma center from 1996 to 2012. , 2015, Journal of the American College of Radiology : JACR.

[18]  Stephen R. Baker,et al.  The causes of medical malpractice suits against radiologists in the United States. , 2013, Radiology.

[19]  P. Rhee,et al.  Preventable morbidity at a mature trauma center. , 2009, Archives of surgery.

[20]  J. Richer,et al.  Discordant results in x-ray interpretations between ED physicians and radiologists. A prospective investigation of 30000 trauma patients. , 2005, The American journal of emergency medicine.

[21]  Les Irwig,et al.  Accuracy of diagnostic tests read with and without clinical information: a systematic review. , 2004, JAMA.

[22]  H. Guly,et al.  Diagnostic errors in an accident and emergency department , 2001, Emergency medicine journal : EMJ.