Forecasting Zika Incidence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data

Background Over 400,000 people across the Americas are thought to have been infected with Zika virus as a consequence of the 2015–2016 Latin American outbreak. Official government-led case count data in Latin America are typically delayed by several weeks, making it difficult to track the disease in a timely manner. Thus, timely disease tracking systems are needed to design and assess interventions to mitigate disease transmission. Methodology/Principal Findings We combined information from Zika-related Google searches, Twitter microblogs, and the HealthMap digital surveillance system with historical Zika suspected case counts to track and predict estimates of suspected weekly Zika cases during the 2015–2016 Latin American outbreak, up to three weeks ahead of the publication of official case data. We evaluated the predictive power of these data and used a dynamic multivariable approach to retrospectively produce predictions of weekly suspected cases for five countries: Colombia, El Salvador, Honduras, Venezuela, and Martinique. Models that combined Google (and Twitter data where available) with autoregressive information showed the best out-of-sample predictive accuracy for 1-week ahead predictions, whereas models that used only Google and Twitter typically performed best for 2- and 3-week ahead predictions. Significance Given the significant delay in the release of official government-reported Zika case counts, we show that these Internet-based data streams can be used as timely and complementary ways to assess the dynamics of the outbreak.

[1]  Duane J. Gubler,et al.  A Critical Assessment of Vector Control for Dengue Prevention , 2015, PLoS neglected tropical diseases.

[2]  J. Brownstein,et al.  A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives , 2014, Journal of medical Internet research.

[3]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[4]  P. Mead,et al.  Transmission of Zika Virus Through Sexual Contact with Travelers to Areas of Ongoing Transmission - Continental United States, 2016. , 2016, MMWR. Morbidity and mortality weekly report.

[5]  T. Scott,et al.  Sentinel versus passive surveillance for measuring changes in dengue incidence: Evidence from three concurrent surveillance systems in Iquitos, Peru , 2016, bioRxiv.

[6]  John S. Brownstein,et al.  Evaluating the performance of infectious disease forecasts: A comparison of climate-driven and seasonal dengue forecasts for Mexico , 2016, Scientific Reports.

[7]  R. Lanciotti,et al.  Zika virus outbreak on Yap Island, Federated States of Micronesia. , 2009, The New England journal of medicine.

[8]  A Vespignani,et al.  Web‐based participatory surveillance of infectious diseases: the Influenzanet participatory surveillance experience , 2013, Clinical Microbiology and Infection.

[9]  Mauricio Santillana,et al.  Accurate estimation of influenza epidemics using Google search data via ARGO , 2015, Proceedings of the National Academy of Sciences.

[10]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[11]  Mauricio Santillana,et al.  Utilizing Nontraditional Data Sources for Near Real-Time Estimation of Transmission Dynamics During the 2015-2016 Colombian Zika Virus Disease Outbreak , 2016, JMIR public health and surveillance.

[12]  Julie M. Wolf The Multipurpose Tool of Social Media: Applications for Scientists, Science Communicators, and Educators , 2017, Clinical Microbiology Newsletter.

[13]  Giovanini Evelim Coelho,et al.  Zika virus in the Americas: Early epidemiological and genetic findings , 2016, Science.

[14]  John S. Brownstein,et al.  2014 Ebola Outbreak: Media Events Track Changes in Observed Reproductive Number , 2015, PLoS currents.

[15]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[16]  Microcephaly and Zika virus. , 2016 .

[17]  John S. Brownstein,et al.  Evaluation of Internet-Based Dengue Query Data: Google Dengue Trends , 2014, PLoS neglected tropical diseases.

[18]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[19]  E. Nsoesie,et al.  Monitoring Influenza Epidemics in China with Search Query from Baidu , 2013, PloS one.

[20]  L. Waller,et al.  Time Lags between Exanthematous Illness Attributed to Zika Virus, Guillain-Barré Syndrome, and Microcephaly, Salvador, Brazil , 2016, Emerging infectious diseases.

[21]  Rumi Chunara,et al.  Flu Near You: Crowdsourced Symptom Reporting Spanning 2 Influenza Seasons. , 2015, American journal of public health.

[22]  Craig Dalton,et al.  Flutracking: a weekly Australian community online survey of influenza-like illness in 2006, 2007 and 2008. , 2009, Communicable diseases intelligence quarterly report.

[23]  Cécile Viboud,et al.  Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales , 2013, PLoS Comput. Biol..

[24]  Yifei Hu,et al.  Available Evidence of Association between Zika Virus and Microcephaly , 2016, Chinese medical journal.

[25]  J S Brownstein,et al.  Cloud-based Electronic Health Records for Real-time, Region-specific Influenza Surveillance , 2016, Scientific reports.

[26]  M. Santillana,et al.  What can digital disease detection learn from (an external revision to) Google Flu Trends? , 2014, American journal of preventive medicine.

[27]  E. Nsoesie,et al.  Using Clinicians’ Search Query Data to Monitor Influenza Epidemics , 2014, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[28]  M. Johansson,et al.  Projecting Month of Birth for At-Risk Infants after Zika Virus Disease Outbreaks , 2016, Emerging infectious diseases.

[29]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[30]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[31]  Michael J. Paul,et al.  Twitter Improves Influenza Forecasting , 2014, PLoS currents.

[32]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[33]  Ben Y. Reis,et al.  Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project , 2008, PLoS medicine.

[34]  O. Dyer US agency says Zika virus causes microcephaly , 2016, British Medical Journal.

[35]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[36]  Amy C Morrison,et al.  Vector dynamics and transmission of dengue virus: implications for dengue surveillance and prevention strategies: vector dynamics and dengue prevention. , 2010, Current topics in microbiology and immunology.

[37]  D. Cummings,et al.  Prediction of Dengue Incidence Using Search Query Surveillance , 2011, PLoS neglected tropical diseases.

[38]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[39]  Naren Ramakrishnan,et al.  Monitoring Disease Trends using Hospital Traffic Data from High Resolution Satellite Imagery: A Feasibility Study , 2015, Scientific Reports.