Developing and validating COVID-19 adverse outcome risk prediction models from a bi-national European cohort of 5594 patients

Patients with severe COVID-19 have overwhelmed healthcare systems worldwide. We hypothesized that Machine Learning (ML) models could be used to predict risks at different stages of management (at diagnosis, hospital admission and ICU admission) and thereby provide insights into drivers and prognostic markers of disease progression and death.From a cohort of approx. 2.6 million citizens in the two regions of Denmark, SARS-CoV-2 PCR tests were performed on subjects suspected for COVID-19 disease; 3944 cases had at least one positive test and were subjected to further analysis. A cohort of SARS- CoV-2 positive cases from the United Kingdom Biobank was used for external validation.The ML models predicted the risk of death (Receiver Operation Characteristics – Area Under the Curve, ROC-AUC) of 0.904 at diagnosis, 0.818, at hospital admission and 0.723 at Intensive Care Unit (ICU) admission. Similar metrics were achieved for predicted risks of hospital and ICU admission and use of mechanical ventilation. We identified some common risk factors, including age, body mass index (BMI) and hypertension as driving factors, although the top risk features shifted towards markers of shock and organ dysfunction in ICU patients. The external validation indicated fair predictive performance for mortality prediction, but suboptimal performance for predicting ICU admission.ML may be used to identify drivers of progression to more severe disease and for prognostication patients in patients with COVID-19. Prognostic features included age, BMI and hypertension, although markers of shock and organ dysfunction became more important in more severe cases.We provide access to an online risk calculator based on these findings.The study was funded by grants from the Novo Nordisk Foundation to MS (#NNF20SA0062879 and #NNF19OC0055183) and MN (#NNF20SA0062879). The foundation took no part in project design, data handling and manuscript preparation.

[1]  P. Serruys,et al.  Association of hypertension and antihypertensive treatment with COVID-19 mortality: a retrospective observational study , 2020, European heart journal.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Carson K. Lam,et al.  Prediction of respiratory decompensation in Covid-19 patients using machine learning: The READY trial , 2020, Computers in Biology and Medicine.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  K. Cao,et al.  Artificial Intelligence Distinguishes COVID-19 from Community Acquired Pneumonia on Chest CT , 2020, Radiology.

[6]  G. Heinze,et al.  Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal , 2020, BMJ.

[7]  J. Soriano,et al.  Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing , 2020, Journal of medical Internet research.

[8]  Yoon Jung Choi,et al.  Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study , 2020, Scientific reports.

[9]  Silke Janitza,et al.  On the overestimation of random forest’s out-of-bag error , 2018, PloS one.

[10]  Ding Ma,et al.  Machine learning based early warning system enables accurate mortality risk prediction for COVID-19 , 2020, Nature Communications.

[11]  Sven Van Poucke,et al.  COVID-19 and Liver Dysfunction: Current Insights and Emergent Therapeutic Strategies , 2020, Journal of clinical and translational hepatology.

[12]  Lei Dong,et al.  Kidney disease is associated with in-hospital death of patients with COVID-19 , 2020, Kidney International.

[13]  J. Shepard,et al.  Case 17-2020: A 68-Year-Old Man with Covid-19 and Acute Kidney Injury. , 2020, The New England journal of medicine.

[14]  D. Mathieu,et al.  High Prevalence of Obesity in Severe Acute Respiratory Syndrome Coronavirus‐2 (SARS‐CoV‐2) Requiring Invasive Mechanical Ventilation , 2020, Obesity.

[15]  J. Legramante,et al.  Complete blood count might help to identify subjects with high probability of testing positive to SARS-CoV-2. , 2020, Clinical medicine.

[16]  Esmita Charani,et al.  Prognostic Modeling of COVID-19 Using Artificial Intelligence in the United Kingdom: Model Development and Validation , 2020, Journal of Medical Internet Research.

[17]  J. Montaner,et al.  Multisystem organ failure predicts mortality of ICU patients with acute respiratory failure secondary to AIDS-related PCP. , 1992, Chest.

[18]  Bing Yu,et al.  Characteristics of inflammatory factors and lymphocyte subsets in patients with severe COVID‐19 , 2020, Journal of medical virology.

[19]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[20]  Nan Tang,et al.  SARS-CoV-2 and viral sepsis: observations and hypotheses , 2020, The Lancet.

[21]  Heng Fan,et al.  Diabetes is a risk factor for the progression and prognosis of COVID‐19 , 2020, Diabetes/metabolism research and reviews.

[22]  Wenhong Zhang,et al.  Lactate dehydrogenase and susceptibility to deterioration of mild COVID-19 patients: a multicenter nested case-control study , 2020, BMC Medicine.

[23]  G. Collins,et al.  PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies , 2019, Annals of Internal Medicine.

[24]  A. Harky,et al.  COVID-19 and Multiorgan Response , 2020, Current Problems in Cardiology.

[25]  L. Mombaerts,et al.  An interpretable mortality prediction model for COVID-19 patients , 2020, Nature Machine Intelligence.

[26]  Iain B McInnes,et al.  Obesity a Risk Factor for Severe COVID-19 Infection: Multiple Potential Mechanisms. , 2020, Circulation.

[27]  F. Cabitza,et al.  Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests , 2020, medRxiv.

[28]  J. Carcillo,et al.  Hyperferritinemia and inflammation , 2017, International immunology.

[29]  J. Syrjänen,et al.  Obesity and the outcome of infection. , 2010, The Lancet. Infectious diseases.

[30]  Mark Chappell,et al.  A crucial role of angiotensin converting enzyme 2 (ACE2) in SARS coronavirus–induced lung injury , 2005, Nature Medicine.

[31]  Eun Ji Kim,et al.  Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. , 2020, JAMA.

[32]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[33]  K. Yuen,et al.  Clinical Characteristics of Coronavirus Disease 2019 in China , 2020, The New England journal of medicine.

[34]  D. Melzer,et al.  Preexisting Comorbidities Predicting COVID-19 and Mortality in the UK Biobank Community Cohort , 2020, The journals of gerontology. Series A, Biological sciences and medical sciences.

[35]  Adam Trendowicz,et al.  Model Development and Validation , 2013 .

[36]  C. Mantzoros,et al.  Severe obesity, increasing age and male sex are independently associated with worse in-hospital outcomes, and higher in-hospital mortality, in a cohort of patients with COVID-19 in the Bronx, New York , 2020, Metabolism.

[37]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[38]  Richard D Riley,et al.  Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal , 2020 .

[39]  Lei Liu,et al.  COVID-19: Abnormal liver function tests , 2020, Journal of Hepatology.

[40]  S. Petersen,et al.  COVID-19 and the UK Biobank—Opportunities and Challenges for Research and Collaboration With Other Large Population Studies , 2020, Frontiers in Cardiovascular Medicine.