Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis

Background The recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests. Methods In this study, we propose to generate a more accurate diagnosis model of COVID-19 based on patient symptoms and routine test results by applying machine learning to reanalyzing COVID-19 data from 151 published studies. We aim to investigate correlations between clinical variables, cluster COVID-19 patients into subtypes, and generate a computational classification model for discriminating between COVID-19 patients and influenza patients based on clinical variables alone. Results We discovered several novel associations between clinical variables, including correlations between being male and having higher levels of serum lymphocytes and neutrophils. We found that COVID-19 patients could be clustered into subtypes based on serum levels of immune cells, gender, and reported symptoms. Finally, we trained an XGBoost model to achieve a sensitivity of 92.5% and a specificity of 97.9% in discriminating COVID-19 patients from influenza patients. Conclusions We demonstrated that computational methods trained on large clinical datasets could yield ever more accurate COVID-19 diagnostic models to mitigate the impact of lack of testing. We also presented previously unknown COVID-19 clinical variable correlations and clinical subgroups.

[1]  Zhaofeng Chen,et al.  Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis , 2020, International Journal of Infectious Diseases.

[2]  Carrie Reed,et al.  Seasonal Incidence of Symptomatic Influenza in the United States , 2018, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[3]  K Feng,et al.  [Analysis of CT features of 15 Children with 2019 novel coronavirus infection]. , 2020, Zhonghua er ke za zhi = Chinese journal of pediatrics.

[4]  Zhaoping Zhang,et al.  Dynamic changes of lymphocyte counts in adult patients with severe pandemic H1N1 influenza A , 2019, Journal of Infection and Public Health.

[5]  Rui Ji,et al.  Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis , 2020, International Journal of Infectious Diseases.

[6]  R. Morgan,et al.  COVID-19: the gendered impacts of the outbreak , 2020, The Lancet.

[7]  M Q Zhang,et al.  [Clinical features of 2019 novel coronavirus pneumonia in the early stage from a fever clinic in Beijing]. , 2020, Zhonghua jie he he hu xi za zhi = Zhonghua jiehe he huxi zazhi = Chinese journal of tuberculosis and respiratory diseases.

[8]  Charles S. Dela Cruz,et al.  Time Kinetics of Viral Clearance and Resolution of Symptoms in Novel Coronavirus Infection , 2020, American journal of respiratory and critical care medicine.

[9]  Yuanzhe Li,et al.  Insight into COVID‐2019 for pediatricians , 2020, Pediatric pulmonology.

[10]  B. Bain,et al.  Normal haematological values: sex difference in neutrophil count. , 1975, British medical journal.

[11]  Cynthia F. Bearer,et al.  COVID-19 in children and altered inflammatory responses , 2020, Pediatric Research.

[12]  Jizhen Ren,et al.  Clinical characteristics of hospitalized patients with SARS‐CoV‐2 infection: A single arm meta‐analysis , 2020, Journal of medical virology.

[13]  Jeroen J. Bax,et al.  Machine learning of clinical variables and coronary artery calcium scoring for the prediction of obstructive coronary artery disease on coronary computed tomography angiography: analysis from the CONFIRM registry. , 2019, European heart journal.

[14]  Judith Malmgren,et al.  COVID-19 Confirmed Case Incidence Age Shift to Young Persons Age 0-19 and 20-39 Years Over Time: Washington State March - April 2020 , 2020, medRxiv.

[15]  Catherine J Andersen,et al.  Gender Dictates the Relationship between Serum Lipids and Leukocyte Counts in the National Health and Nutrition Examination Survey 1999–2004 , 2019, Journal of clinical medicine.

[16]  S. Rahimi,et al.  Epidemiological and Clinical Aspects of COVID-19; a Narrative Review , 2020, Archives of academic emergency medicine.

[17]  Madalina Olteanu,et al.  SOMbrero: An R Package for Numeric and Non-numeric Self-Organizing Maps , 2014, WSOM.

[18]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[19]  Richard H Scheuermann,et al.  Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance , 2012, Influenza and other respiratory viruses.

[20]  Yahong Chen,et al.  [Clinical features of 2019 novel coronavirus pneumonia in the early stage from a fever clinic in Beijing]. , 2020, Zhonghua jie he he hu xi za zhi = Zhonghua jiehe he huxi zazhi = Chinese journal of tuberculosis and respiratory diseases.

[21]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[22]  Xiaoling Liu,et al.  Imaging and clinical features of patients with 2019 novel coronavirus SARS‐CoV‐2: A systematic review and meta‐analysis , 2020, Journal of medical virology.