Faster indicators of dengue fever case counts using Google and Twitter

Dengue is a major threat to public health in Brazil, the world’s sixth biggest country by population, with over 1.5 million cases recorded in 2019 alone. Official data on dengue case counts is delivered incrementally and, for many reasons, often subject to delays of weeks. In contrast, data on dengue-related Google searches and Twitter messages is available in full with no delay. Here, we describe a model which uses online data to deliver improved weekly estimates of dengue incidence in Rio de Janeiro. We address a key shortcoming of previous online data disease surveillance models by explicitly accounting for the incremental delivery of case count data, to ensure that our approach can be used in practice. We also draw on data from Google Trends and Twitter in tandem, and demonstrate that this leads to slightly better estimates than a model using only one of these data streams alone. Our results provide evidence that online data can be used to improve both the accuracy and precision of rapid estimates of disease incidence, even where the underlying case count data is subject to long and varied delays.

[1]  T. Endy Human Immune Responses to Dengue Virus Infection: Lessons Learned from Prospective Cohort Studies , 2014, Front. Immunol..

[2]  Samuel C. Kou,et al.  Advances in using Internet searches to track dengue , 2016, PLoS Comput. Biol..

[3]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[4]  P. Gething,et al.  Refining the Global Spatial Limits of Dengue Virus Transmission by Evidence-Based Consensus , 2012, PLoS neglected tropical diseases.

[5]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[6]  Simon I Hay,et al.  The global burden of dengue: an analysis from the Global Burden of Disease Study 2013. , 2016, The Lancet. Infectious diseases.

[7]  Andrew C. Miller,et al.  Advances in nowcasting influenza-like illness rates using search query logs , 2015, Scientific Reports.

[8]  Emily H. Chan,et al.  Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance , 2011, PLoS neglected tropical diseases.

[9]  Baltazar Nunes,et al.  Influenza surveillance in Europe: establishing epidemic thresholds by the Moving Epidemic Method , 2012, Influenza and other respiratory viruses.

[10]  Flávio Codeço Coelho,et al.  InfoDengue: a nowcasting system for the surveillance of dengue fever transmission , 2016, bioRxiv.

[11]  Hari Kusnanto,et al.  Prediction of Dengue Outbreaks Based on Disease Surveillance and Meteorological Data , 2016, PloS one.

[12]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[13]  Chris Tofallis,et al.  A better measure of relative prediction accuracy for model selection and model estimation , 2014, J. Oper. Res. Soc..

[14]  A. Galvani,et al.  Time series analysis of dengue incidence in Rio de Janeiro, Brazil. , 2008, The American journal of tropical medicine and hygiene.

[15]  Wagner Meira,et al.  Dengue prediction by the web: Tweets are a useful tool for estimating and forecasting Dengue at country and city level , 2017, PLoS neglected tropical diseases.

[16]  Oliver Stoner,et al.  A modelling approach for correcting reporting delays in disease surveillance data , 2017, Statistics in medicine.

[17]  J. Rocklöv,et al.  Forecast of Dengue Incidence Using Temperature and Rainfall , 2012, PLoS neglected tropical diseases.

[18]  Tobias Preis,et al.  Adaptive nowcasting of influenza outbreaks using Google searches , 2014, Royal Society Open Science.

[19]  Krzysztof Sakrejda,et al.  Case Study in Evaluating Time Series Prediction Models Using the Relative Mean Absolute Error , 2016, The American statistician.

[20]  Mauricio Santillana,et al.  Accurate estimation of influenza epidemics using Google search data via ARGO , 2015, Proceedings of the National Academy of Sciences.

[21]  Wagner Meira,et al.  A latent shared-component generative model for real-time disease surveillance using Twitter data , 2015, ArXiv.

[22]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.