Enhanced COVID-19 data for improved prediction of survival

The current COVID-19 pandemic, caused by the rapid world-wide spread of the SARS-CoV-2 virus, is having severe consequences for human health and the world economy. The virus effects individuals quite differently, with many infected patients showing only mild symptoms, and others showing critical illness. To lessen the impact of the pandemic, one important question is which factors predict the death of a patient? Here, we construct an enhanced COVID-19 dataset by processing two existing databases (from Kaggle and WHO) and using natural language processing methods to enhance the data by adding local weather conditions and research sentiment. Author summary In this study, we contribute an enhanced COVID-19 dataset, which contains 183 samples and 43 features. Application of Extreme Gradient Boosting (XGBoost) on the enhanced dataset achieves 95% accuracy in predicting patients survival, with country-wise research sentiment, and then age and local weather, showing the most importance. All data and source code are available at http://ab.inf.uni-tuebingen.de/publications/papers/COVID-19.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Multidisciplinary research priorities for the COVID-19 pandemic: a call for action for mental health science. , 2020 .

[3]  Yun Kang,et al.  Regional Influenza Prediction with Sampling Twitter Data and PDE Model , 2020, International journal of environmental research and public health.

[4]  Ali Narin,et al.  Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks , 2020, Pattern Analysis and Applications.

[5]  Jaspreet Singh,et al.  COVID-19 and Its Impact on Society , 2020 .

[6]  José Luis Molinuevo,et al.  Dementia care during COVID-19 , 2020, The Lancet.

[7]  Parham Habibzadeh,et al.  Temperature, Humidity and Latitude Analysis to Predict Potential Spread and Seasonality for COVID-19. , 2020, SSRN.

[8]  Etienne Joly,et al.  Faculty Opinions recommendation of Temperature and Latitude Analysis to Predict Potential Spread and Seasonality for COVID-19. , 2020 .

[9]  R. Armitage,et al.  COVID-19 and the consequences of isolating the elderly , 2020, The Lancet Public Health.

[10]  J. Karlberg,et al.  Environmental factors on the SARS epidemic: air temperature, passage of time and multiplicative effect of hospital infection , 2005, Epidemiology and Infection.

[11]  Miad Faezipour,et al.  A review of influenza detection and prediction through social networking sites , 2018, Theoretical Biology and Medical Modelling.

[12]  J. Glynn Protecting workers aged 60–69 years from COVID-19 , 2020, The Lancet Infectious Diseases.

[13]  Michael Triplett Evidence that higher temperatures are associated with lower incidence of COVID-19 in pandemic state, cumulative cases reported up to March 27, 2020 , 2020, medRxiv.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Alok N. Choudhary,et al.  Forecasting Influenza Levels Using Real-Time Social Media Streams , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[16]  J. Castaldelli-Maia,et al.  The outbreak of COVID-19 coronavirus and its impact on global mental health , 2020, The International journal of social psychiatry.

[17]  C. Whittaker,et al.  Estimates of the severity of coronavirus disease 2019: a model-based analysis , 2020, The Lancet Infectious Diseases.

[18]  Andrew K. Przybylski,et al.  Multidisciplinary research priorities for the COVID-19 pandemic: a call for action for mental health science , 2020, The Lancet Psychiatry.

[19]  Amir Barati Farimani,et al.  Potential neutralizing antibodies discovered for novel corona virus using machine learning , 2020, Scientific Reports.