Systematic Review and Comparison of Publicly Available ICU Data Sets—A Decision Guide for Clinicians and Data Scientists

OBJECTIVE: As data science and artificial intelligence continue to rapidly gain traction, the publication of freely available ICU datasets has become invaluable to propel data-driven clinical research. In this guide for clinicians and researchers, we aim to: 1) systematically search and identify all publicly available adult clinical ICU datasets, 2) compare their characteristics, data quality, and richness and critically appraise their strengths and weaknesses, and 3) provide researchers with suggestions, which datasets are appropriate for answering their clinical question. DATA SOURCES: A systematic search was performed in Pubmed, ArXiv, MedRxiv, and BioRxiv. STUDY SELECTION: We selected all studies that reported on publicly available adult patient-level intensive care datasets. DATA EXTRACTION: A total of four publicly available, adult, critical care, patient-level databases were included (Amsterdam University Medical Center data base [AmsterdamUMCdb], eICU Collaborative Research Database eICU CRD], High time-resolution intensive care unit dataset [HiRID], and Medical Information Mart for Intensive Care-IV). Databases were compared using a priori defined categories, including demographics, patient characteristics, and data richness. The study protocol and search strategy were prospectively registered. DATA SYNTHESIS: Four ICU databases fulfilled all criteria for inclusion and were queried using SQL (PostgreSQL version 12; PostgreSQL Global Development Group) and analyzed using R (R Foundation for Statistical Computing, Vienna, Austria). The number of unique patient admissions varied between 23,106 (AmsterdamUMCdb) and 200,859 (eICU-CRD). Frequency of laboratory values and vital signs was highest in HiRID, for example, 5.2 (±3.4) lactate values per day and 29.7 (±10.2) systolic blood pressure values per hour. Treatment intensity varied with vasopressor and ventilatory support in 69.0% and 83.0% of patients in AmsterdamUMCdb versus 12.0% and 21.0% in eICU-CRD, respectively. ICU mortality ranged from 5.5% in eICU-CRD to 9.9% in AmsterdamUMCdb. CONCLUSIONS: We identified four publicly available adult clinical ICU datasets. Sample size, severity of illness, treatment intensity, and frequency of reported parameters differ markedly between the databases. This should guide clinicians and researchers which databases to best answer their clinical questions.

[1]  Ellen G. M. Smit,et al.  The Dutch Data Warehouse, a multicenter and full-admission electronic health records database for critically ill COVID-19 patients , 2021, Critical Care.

[2]  Nicolai Meinshausen,et al.  ricu: R’s interface to intensive care data , 2021, GigaScience.

[3]  T. Scheeren,et al.  VitalDB: fostering collaboration in anaesthesia research. , 2021, British journal of anaesthesia.

[4]  G. Clermont,et al.  Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example* , 2021, Critical care medicine.

[5]  Brett K. Beaulieu-Jones,et al.  Temporal bias in case-control design: preventing reliable predictions of the future , 2021, Nature Communications.

[6]  D. Maslove,et al.  Characterizing the Patients, Hospitals, and Data Quality of the eICU Collaborative Research Database. , 2020, Critical care medicine.

[7]  Finale Doshi-Velez,et al.  The myth of generalisability in clinical research and machine learning in health care , 2020, The Lancet Digital Health.

[8]  Stephanie L. Hyland,et al.  Early prediction of circulatory failure in the intensive care unit using machine learning , 2020, Nature Medicine.

[9]  L. Celi,et al.  Machine learning can accurately predict pre-admission baseline hemoglobin and creatinine in intensive care patients , 2019, npj Digital Medicine.

[10]  Leo Anthony Celi,et al.  Critical Care, Critical Data , 2019, Biomedical engineering and computational biology.

[11]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[12]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[13]  T. H. Kyaw,et al.  Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database* , 2011, Critical care medicine.

[14]  I. Olkin,et al.  Meta-analysis of observational studies in epidemiology - A proposal for reporting , 2000 .

[15]  R. M. Farrier ELECTRONIC MONITORING OF THE CRITICALLY ILL. , 1964, Military medicine.

[16]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[17]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2009, Journal of clinical epidemiology.

[18]  R G Mark,et al.  PhysioNet: a research resource for studies of complex physiologic and biomedical signals , 2000, Computers in Cardiology 2000. Vol.27 (Cat. 00CH37163).

[19]  C. A. Caceres Telemetry in medicine and biology. , 1968, Advances in biomedical engineering and medical physics.