Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries

Background Novel influenza surveillance systems that leverage Internet-based real-time data sources including Internet search frequencies, social-network information, and crowd-sourced flu surveillance tools have shown improved accuracy over the past few years in data-rich countries like the United States. These systems not only track flu activity accurately, but they also report flu estimates a week or more ahead of the publication of reports produced by healthcare-based systems, such as those implemented and managed by the Centers for Disease Control and Prevention. Previous work has shown that the predictive capabilities of novel flu surveillance systems, like Google Flu Trends (GFT), in developing countries in Latin America have not yet delivered acceptable flu estimates. Objective The aim of this study was to show that recent methodological improvements on the use of Internet search engine information to track diseases can lead to improved retrospective flu estimates in multiple countries in Latin America. Methods A machine learning-based methodology that uses flu-related Internet search activity and historical information to monitor flu activity, named ARGO (AutoRegression with Google search), was extended to generate flu predictions for 8 Latin American countries (Argentina, Bolivia, Brazil, Chile, Mexico, Paraguay, Peru, and Uruguay) for the time period: January 2012 to December of 2016. These retrospective (out-of-sample) Influenza activity predictions were compared with historically observed flu suspected cases in each country, as reported by Flunet, an influenza surveillance database maintained by the World Health Organization. For a baseline comparison, retrospective (out-of-sample) flu estimates were produced for the same time period using autoregressive models that only leverage historical flu activity information. Results Our results show that ARGO-like models’ predictive power outperform autoregressive models in 6 out of 8 countries in the 2012-2016 time period. Moreover, ARGO significantly improves on historical flu estimates produced by the now discontinued GFT for the time period of 2012-2015, where GFT information is publicly available. Conclusions We demonstrate here that a self-correcting machine learning method, leveraging Internet-based disease-related search activity and historical flu trends, has the potential to produce reliable and timely flu estimates in multiple Latin American countries. This methodology may prove helpful to local public health officials who design and implement interventions aimed at mitigating the effects of influenza outbreaks. Our methodology generally outperforms both the now-discontinued tool GFT, and autoregressive methodologies that exploit only historical flu activity to produce future disease estimates.

[1]  M. L. Matute,et al.  Timing of influenza epidemics and vaccines in the American tropics, 2002–2008, 2011–2014 , 2016, Influenza and other respiratory viruses.

[2]  A. Ciapponi,et al.  Burden of influenza in Latin America and the Caribbean: a systematic review and meta‐analysis , 2012, Influenza and other respiratory viruses.

[3]  Samuel C. Kou,et al.  Advances in using Internet searches to track dengue , 2016, PLoS Comput. Biol..

[4]  Rok Sosic,et al.  Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis , 2018, JMIR public health and surveillance.

[5]  John S. Brownstein,et al.  Using electronic health records and Internet search information for accurate influenza forecasting , 2017, BMC Infectious Diseases.

[6]  Cécile Viboud,et al.  Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales , 2013, PLoS Comput. Biol..

[7]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[8]  Joseph P. Romano,et al.  The stationary bootstrap , 1994 .

[9]  John S. Brownstein,et al.  Forecasting Zika Incidence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data , 2017, PLoS neglected tropical diseases.

[10]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[11]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[12]  James M. Hyman,et al.  Forecasting the 2013–2014 Influenza Season Using Wikipedia , 2014, PLoS Comput. Biol..

[13]  Mauricio Santillana,et al.  Editorial Commentary: Perspectives on the Future of Internet Search Engines and Biosurveillance Systems. , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[14]  Ronald Rosenfeld,et al.  Flexible Modeling of Epidemics with an Empirical Bayes Framework , 2014, PLoS Comput. Biol..

[15]  Daniela Perrotta,et al.  Forecasting Seasonal Influenza Fusing Digital Indicators and a Mechanistic Disease Model , 2017, WWW.

[16]  W. John Boscardin,et al.  Evaluating Google Flu Trends in Latin America: Important Lessons for the Next Phase of Digital Disease Detection , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[17]  M. Santillana,et al.  What can digital disease detection learn from (an external revision to) Google Flu Trends? , 2014, American journal of preventive medicine.

[18]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[19]  Alina Deshpande,et al.  Global Disease Monitoring and Forecasting with Wikipedia , 2014, PLoS Comput. Biol..

[20]  Mauricio Santillana,et al.  ARGO: a model for accurate estimation of influenza epidemics using Google search data , 2015, ArXiv.

[21]  J. Shaman,et al.  Forecasting seasonal outbreaks of influenza , 2012, Proceedings of the National Academy of Sciences.

[22]  A Vespignani,et al.  Web‐based participatory surveillance of infectious diseases: the Influenzanet participatory surveillance experience , 2013, Clinical Microbiology and Infection.

[23]  Rumi Chunara,et al.  Flu Near You: Crowdsourced Symptom Reporting Spanning 2 Influenza Seasons. , 2015, American journal of public health.

[24]  E. Nsoesie,et al.  Using Clinicians’ Search Query Data to Monitor Influenza Epidemics , 2014, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[25]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[26]  Mauricio Santillana,et al.  Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches , 2019, Nature Communications.