Using web search queries to monitor influenza-like illness: an exploratory retrospective analysis, Netherlands, 2017/18 influenza season

Background Despite the early development of Google Flu Trends in 2009, standards for digital epidemiology methods have not been established and research from European countries is scarce. Aim In this article, we study the use of web search queries to monitor influenza-like illness (ILI) rates in the Netherlands in real time. Methods In this retrospective analysis, we simulated the weekly use of a prediction model for estimating the then-current ILI incidence across the 2017/18 influenza season solely based on Google search query data. We used weekly ILI data as reported to The European Surveillance System (TESSY) each week, and we removed the then-last 4 weeks from our dataset. We then fitted a prediction model based on the then-most-recent search query data from Google Trends to fill the 4-week gap (‘Nowcasting’). Lasso regression, in combination with cross-validation, was applied to select predictors and to fit the 52 models, one for each week of the season. Results The models provided accurate predictions with a mean and maximum absolute error of 1.40 (95% confidence interval: 1.09–1.75) and 6.36 per 10,000 population. The onset, peak and end of the epidemic were predicted with an error of 1, 3 and 2 weeks, respectively. The number of search terms retained as predictors ranged from three to five, with one keyword, ‘griep’ (‘flu’), having the most weight in all models. Discussion This study demonstrates the feasibility of accurate, real-time ILI incidence predictions in the Netherlands using Google search query data.

[1]  Gail M Williams,et al.  Internet-based surveillance systems for monitoring emerging infectious diseases , 2013, The Lancet Infectious Diseases.

[2]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[3]  Seung-Pyo Jun,et al.  Ten years of research change using Google Trends: From the perspective of big data utilizations and applications , 2017 .

[4]  Leonard J. Tashman,et al.  Out-of-sample tests of forecasting accuracy: an analysis and review , 2000 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[7]  M. Santillana,et al.  What can digital disease detection learn from (an external revision to) Google Flu Trends? , 2014, American journal of preventive medicine.

[8]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[9]  Caroline O. Buckee,et al.  Digital Epidemiology , 2012, PLoS Comput. Biol..

[10]  PP Schneider,et al.  PREPRINT: Using digital epidemiology methods to monitor influenza-like illness in the Netherlands in real-time: the 2017-2018 season , 2018, bioRxiv.

[11]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[12]  M. Vicente,et al.  Monitoring influenza activity in Europe with Google Flu Trends: comparison with the findings of sentinel physician networks - results for 2009-10. , 2010, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[13]  Cécile Viboud,et al.  Infectious Disease Surveillance in the Big Data Era: Towards Faster and Locally Relevant Systems. , 2016, The Journal of infectious diseases.

[14]  Jesse O'Shea,et al.  Digital disease detection: A systematic review of event-based internet biosurveillance systems , 2017, International Journal of Medical Informatics.

[15]  Georg Heinze,et al.  Variable selection – A review and recommendations for the practicing statistician , 2018, Biometrical journal. Biometrische Zeitschrift.

[16]  Res,et al.  Annual report. Surveillance of influenza and other respiratory infections in the Netherlands: winter 2015/2016 : Surveillance van influenza en andere luchtweginfecties: winter 2015/2016 , 2016 .

[17]  Tobias Preis,et al.  Adaptive nowcasting of influenza outbreaks using Google searches , 2014, Royal Society Open Science.

[18]  Robert Tibshirani,et al.  Post‐selection inference for ℓ1 ‐penalized likelihood models , 2016, The Canadian journal of statistics = Revue canadienne de statistique.

[19]  Madhav V. Marathe,et al.  A framework for evaluating epidemic forecasts , 2017, BMC Infectious Diseases.

[20]  Dirk Eddelbuettel,et al.  R Functions to Perform and Display Google Trends Queries , 2015 .

[21]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[22]  Florian Thiery,et al.  Linked COVID-19 Data: Johns Hopkins University (JHU) and European Centre for Disease Prevention and Control (ECDC) , 2020 .

[23]  John S. Brownstein,et al.  Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time , 2014, PLoS Comput. Biol..

[24]  S. Scholtes,et al.  Empirical prediction intervals revisited , 2014 .

[25]  Amy M Bovi Use of Health-Related Online Sites , 2003, The American journal of bioethics : AJOB.

[26]  Alina Deshpande,et al.  Global Disease Monitoring and Forecasting with Wikipedia , 2014, PLoS Comput. Biol..

[27]  Miguel-Angel Sicilia,et al.  Syndromic Surveillance Models Using Web Data: The Case of Influenza in Greece and Italy Using Google Trends , 2017, JMIR public health and surveillance.