Accurate regional influenza epidemics tracking using Internet search data

Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users’ online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.

[1]  G. Pazour,et al.  Ror2 signaling regulates Golgi structure and transport through IFT20 for tumor invasiveness , 2017, Scientific Reports.

[2]  Haiyan Wang,et al.  Prediction of influenza-like illness based on the improved artificial tree algorithm and artificial neural network , 2018, Scientific Reports.

[3]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[4]  Ingemar J. Cox,et al.  Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance , 2017, WWW.

[5]  Alicia Karspeck,et al.  Real-Time Influenza Forecasts during the 2012–2013 Season , 2013, Nature Communications.

[6]  Ingemar J. Cox,et al.  Multi-Task Learning Improves Disease Models from Web Search , 2018, WWW.

[7]  Galit Shmueli,et al.  Automated time series forecasting for biosurveillance , 2007, Statistics in medicine.

[8]  Jeffrey Shaman,et al.  Forecasting Influenza Outbreaks in Boroughs and Neighborhoods of New York City , 2016, PLoS Comput. Biol..

[9]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[10]  Madhav V. Marathe,et al.  Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions , 2014, SDM.

[11]  Andrew C. Miller,et al.  Advances in nowcasting influenza-like illness rates using search query logs , 2015, Scientific Reports.

[12]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[13]  Marc Lipsitch,et al.  Improving the evidence base for decision making during a pandemic: the example of 2009 influenza A/H1N1. , 2011, Biosecurity and bioterrorism : biodefense strategy, practice, and science.

[14]  Michael J. Paul,et al.  Twitter Improves Influenza Forecasting , 2014, PLoS currents.

[15]  E. Nsoesie,et al.  A systematic review of studies on forecasting the dynamics of influenza outbreaks , 2013, Influenza and other respiratory viruses.

[16]  F. Ellis McKenzie,et al.  Influenza Forecasting in Human Populations: A Scoping Review , 2014, PloS one.

[17]  Sasikiran Kandula,et al.  Improved Discrimination of Influenza Forecast Accuracy Using Consecutive Predictions , 2015, PLoS currents.

[18]  Emily H. Chan,et al.  Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance , 2011, PLoS neglected tropical diseases.

[19]  P. F. Vasconcelos,et al.  In situ immune response and mechanisms of cell damage in central nervous system of fatal cases microcephaly by Zika virus , 2018, Scientific Reports.

[20]  H. White,et al.  Automatic Block-Length Selection for the Dependent Bootstrap , 2004 .

[21]  M. Santillana,et al.  What can digital disease detection learn from (an external revision to) Google Flu Trends? , 2014, American journal of preventive medicine.

[22]  Dotan A. Haim,et al.  Using Networks to Combine “Big Data” and Traditional Surveillance to Improve Influenza Predictions , 2015, Scientific Reports.

[23]  Ronald Rosenfeld,et al.  Flexible Modeling of Epidemics with an Empirical Bayes Framework , 2014, PLoS Comput. Biol..

[24]  Rosemary L. Balleine,et al.  Genome-wide association study of paclitaxel and carboplatin disposition in women with epithelial ovarian cancer , 2018, Scientific Reports.

[25]  James M. Hyman,et al.  Forecasting the 2013–2014 Influenza Season Using Wikipedia , 2014, PLoS Comput. Biol..

[26]  Noel Cressie,et al.  Spatial fay-herriot models for small area estimation with functional covariates , 2013, 1303.6668.

[27]  John S. Brownstein,et al.  Using electronic health records and Internet search information for accurate influenza forecasting , 2017, BMC Infectious Diseases.

[28]  Ye Wen,et al.  Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model , 2017, Scientific Reports.

[29]  Ronald Rosenfeld,et al.  A human judgment approach to epidemiological forecasting , 2017, PLoS Comput. Biol..

[30]  Mauricio Santillana,et al.  Accurate estimation of influenza epidemics using Google search data via ARGO , 2015, Proceedings of the National Academy of Sciences.

[31]  Samuel C. Kou,et al.  Advances in using Internet searches to track dengue , 2016, PLoS Comput. Biol..

[32]  Shane Greenstein,et al.  Economic Analysis of the Digital Economy , 2015 .

[33]  Paola Velardi,et al.  Results from the centers for disease control and prevention’s predict the 2013–2014 Influenza Season Challenge , 2016, BMC Infectious Diseases.

[34]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[35]  J. Shaman,et al.  Forecasting seasonal outbreaks of influenza , 2012, Proceedings of the National Academy of Sciences.

[36]  Nicholas G. Polson,et al.  Tracking Epidemics With Google Flu Trends Data and a State-Space SEIR Model , 2012, Journal of the American Statistical Association.

[37]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[38]  Steven L. Scott,et al.  Predicting the Present with Bayesian Structural Time Series , 2013, Int. J. Math. Model. Numer. Optimisation.

[39]  Marc Lipsitch,et al.  Inference of seasonal and pandemic influenza transmission dynamics , 2015, Proceedings of the National Academy of Sciences.

[40]  Ann Williams,et al.  A new tool for tuberculosis vaccine screening: Ex vivo Mycobacterial Growth Inhibition Assay indicates BCG-mediated protection in a murine model of tuberculosis , 2016, BMC Infectious Diseases.

[41]  Alicia Karspeck,et al.  Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics , 2014, PLoS Comput. Biol..

[42]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[43]  J. Robertson,et al.  Face Dependence of Schottky Barriers Heights of Silicides and Germanides on Si and Ge , 2017, Scientific Reports.