Is a COVID-19 Second Wave Possible in Emilia-Romagna (Italy)? Forecasting a Future Outbreak with Particulate Pollution and Machine Learning

The Nobel laureate Niels Bohr once said that: “Predictions are very difficult, especially if they are about the future”. Nonetheless, models that can forecast future COVID-19 outbreaks are receiving special attention by policymakers and health authorities, with the aim of putting in place control measures before the infections begin to increase. Nonetheless, two main problems emerge. First, there is no a general agreement on which kind of data should be registered for judging on the resurgence of the virus (e.g., infections, deaths, percentage of hospitalizations, reports from clinicians, signals from social media). Not only this, but all these data also suffer from common defects, linked to their reporting delays and to the uncertainties in the collection process. Second, the complex nature of COVID-19 outbreaks makes it difficult to understand if traditional epidemiological models, such as susceptible, infectious, or recovered (SIR), are more effective for a timely prediction of an outbreak than alternative computational models. Well aware of the complexity of this forecasting problem, we propose here an innovative metric for predicting COVID-19 diffusion based on the hypothesis that a relation exists between the spread of the virus and the presence in the air of particulate pollutants, such as PM2.5, PM10, and NO2. Drawing on the recent assumption of 239 experts who claimed that this virus can be airborne, and further considering that particulate matter may favor this airborne route, we developed a machine learning (ML) model that has been instructed with: (i) all the COVID-19 infections that occurred in the Italian region of Emilia-Romagna, one of the most polluted areas in Europe, in the period of February–July 2020, (ii) the daily values of all the particulates taken in the same period and in the same region, and finally (iii) the chronology according to which restrictions were imposed by the Italian Government to human activities. Our ML model was then subjected to a classic ten-fold cross-validation procedure that returned a promising 90% accuracy value. Finally, the model was used to predict a possible resurgence of the virus in all the nine provinces of Emilia-Romagna, in the period of September–December 2020. To make those predictions, input to our ML model were the daily measurements of the aforementioned pollutants registered in the periods of September–December 2017/2018/2019, along with the hypothesis that the mild containment measures taken in Italy in the so-called Phase 3 are obeyed. At the time we write this article, we cannot have a confirmation of the precision of our predictions. Nevertheless, we are projecting a scenario based on an original hypothesis that makes our COVID-19 prediction model unique in the world. Its accuracy will be soon judged by history—and this, too, is science at the service of society.

[1]  P. Klepac,et al.  Early dynamics of transmission and control of COVID-19: a mathematical modelling study , 2020, The Lancet Infectious Diseases.

[2]  Marco Roccetti,et al.  Particulate Matter and COVID-19 Disease Diffusion in Emilia-Romagna (Italy). Already a Cold Case? , 2020, Comput..

[3]  Dario Caro,et al.  Can atmospheric pollution be considered a co-factor in extremely high level of SARS-CoV-2 lethality in Northern Italy? , 2020, Environmental Pollution.

[4]  C. Granger Investigating Causal Relations by Econometric Models and Cross-Spectral Methods , 1969 .

[5]  A. C. Cem Say L'Hôpital's Filter for QSIM , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jing Zhao,et al.  Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia , 2020, The New England journal of medicine.

[7]  F. Passarini,et al.  SARS-Cov-2RNA found on particulate matter of Bergamo in Northern Italy: First evidence , 2020, Environmental Research.

[8]  J. Friedman Stochastic gradient boosting , 2002 .

[9]  P. Munroe,et al.  Artificial intelligence and machine learning to fight COVID-19 , 2020, Physiological genomics.

[10]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[11]  L. Morawska,et al.  It Is Time to Address Airborne Transmission of Coronavirus Disease 2019 (COVID-19) , 2020, Clinical Infectious Diseases.

[12]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Matthew L. Thomas,et al.  Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015 , 2017, The Lancet.

[14]  M. Stehlík,et al.  Small sample robust approach to outliers and correlation of atmospheric pollution and health effects in Santiago de Chile , 2019, Chemometrics and Intelligent Laboratory Systems.

[15]  L. Becchetti,et al.  Understanding the Heterogeneity of Adverse COVID-19 Outcomes: The Role of Poor Quality of Air and Lockdown Decisions , 2020 .

[16]  F. Passarini,et al.  Searching for SARS-COV-2 on Particulate Matter: A Possible Early Indicator of COVID-19 Epidemic Recurrence , 2020, International journal of environmental research and public health.

[17]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[18]  Hannah R. Meredith,et al.  The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application , 2020, Annals of Internal Medicine.

[19]  Mark Buchanan The limits of machine prediction , 2019, Nature Physics.

[20]  Ying Liu,et al.  A COVID-19 Risk Assessment Decision Support System for General Practitioners: Design and Development Study , 2020, Journal of Medical Internet Research.

[21]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[22]  M. Carbone,et al.  Coronaviruses: Facts, Myths, and Hypotheses , 2020, Journal of Thoracic Oncology.

[23]  F. Passarini,et al.  Airborne Transmission Route of COVID-19: Why 2 Meters/6 Feet of Inter-Personal Distance Could Not Be Enough , 2020, International journal of environmental research and public health.

[24]  R. Lawrence Rule-Based Classification Systems Using Classification and Regression Tree (CART) Analysis , 2001 .