MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

Chest radiography is an extremely powerful imaging modality, allowing for a detailed inspection of a patient's thorax, but requiring specialized training for proper interpretation. With the advent of high performance general purpose computer vision algorithms, the accurate automated analysis of chest radiographs is becoming increasingly of interest to researchers. However, a key challenge in the development of these techniques is the lack of sufficient data. Here we describe MIMIC-CXR-JPG v2.0.0, a large dataset of 377,110 chest x-rays associated with 227,827 imaging studies sourced from the Beth Israel Deaconess Medical Center between 2011 - 2016. Images are provided with 14 labels derived from two natural language processing tools applied to the corresponding free-text radiology reports. MIMIC-CXR-JPG is derived entirely from the MIMIC-CXR database, and aims to provide a convenient processed version of MIMIC-CXR, as well as to provide a standard reference for data splits and image labels. All images have been de-identified to protect patient privacy. The dataset is made freely available to facilitate and encourage a wide range of research in medical computer vision.

[1]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[2]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[3]  Richard Duszak,et al.  A County-Level Analysis of the US Radiologist Workforce: Physician Supply and Subspecialty Characteristics. , 2018, Journal of the American College of Radiology : JACR.

[4]  Ronald M. Summers,et al.  NegBio: a high-performance tool for negation and uncertainty detection in radiology reports , 2017, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[5]  Abi Rimmer,et al.  Radiologist shortage leaves patient care at risk, warns royal college , 2017, British Medical Journal.

[6]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Le Lu,et al.  ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sarah Bastawrous,et al.  Improving Patient Safety: Avoiding Unread Imaging Exams in the National VA Enterprise Electronic Health Record , 2017, Journal of Digital Imaging.

[9]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[10]  Richard Duszak,et al.  The U.S. Radiologist Workforce: An Analysis of Temporal and Geographic Variation by Using Large National Datasets. , 2016, Radiology.

[11]  S. Kennedy,et al.  Diagnostic Radiology in Liberia: A Country Report , 2015 .

[12]  Clement J. McDonald,et al.  Preparing a collection of radiology examinations for distribution and retrieval , 2015, J. Am. Medical Informatics Assoc..

[13]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[14]  D. Rosman,et al.  Imaging in the Land of 1000 Hills: Rwanda Radiology Country Report , 2015 .

[15]  K. Doi,et al.  Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. , 2000, AJR. American journal of roentgenology.

[16]  S. Havlin,et al.  Physionet: components of a new research resource for complex physiologic signals , 2000 .