Use Internet Search Data to Accurately Track State-Level Influenza Epidemics

For epidemics control and prevention, timely insights of potential hot spots are invaluable. Alternative to traditional epidemic surveillance, which often lags behind real time by weeks, big data from the Internet provide important information of the current epidemic trends. Here we present a methodology, ARGOX (Augmented Regression with GOogle data CROSS space), for accurate real-time tracking of state-level influenza epidemics in the United States. ARGOX combines Internet search data at the national, regional and state levels with traditional influenza surveillance data from the Centers for Disease Control and Prevention, and accounts for both the spatial correlation structure of state-level influenza activities and the evolution of people's Internet search pattern. ARGOX achieves on average 28\% error reduction over the best alternative for real-time state-level influenza estimation for 2014 to 2020. ARGOX is robust and reliable and can be potentially applied to track county- and city-level influenza activity and other infectious diseases.

[1]  Mauricio Santillana,et al.  Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches , 2019, Nature Communications.

[2]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[3]  Ingemar J. Cox,et al.  Multi-Task Learning Improves Disease Models from Web Search , 2018, WWW.

[4]  Shaoyang Ning,et al.  Accurate regional influenza epidemics tracking using Internet search data , 2019, Scientific Reports.

[5]  E. Nsoesie,et al.  A systematic review of studies on forecasting the dynamics of influenza outbreaks , 2013, Influenza and other respiratory viruses.

[6]  Galit Shmueli,et al.  Automated time series forecasting for biosurveillance , 2007, Statistics in medicine.

[7]  F. Ellis McKenzie,et al.  Influenza Forecasting in Human Populations: A Scoping Review , 2014, PloS one.

[8]  Sasikiran Kandula,et al.  Improved Discrimination of Influenza Forecast Accuracy Using Consecutive Predictions , 2015, PLoS currents.

[9]  Alicia Karspeck,et al.  Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics , 2014, PLoS Comput. Biol..

[10]  John S. Brownstein,et al.  Using electronic health records and Internet search information for accurate influenza forecasting , 2017, BMC Infectious Diseases.

[11]  Steven L. Scott,et al.  Predicting the Present with Bayesian Structural Time Series , 2013, Int. J. Math. Model. Numer. Optimisation.

[12]  Alicia Karspeck,et al.  Real-Time Influenza Forecasts during the 2012–2013 Season , 2013, Nature Communications.

[13]  Dotan A. Haim,et al.  Using Networks to Combine “Big Data” and Traditional Surveillance to Improve Influenza Predictions , 2015, Scientific Reports.

[14]  Ronald Rosenfeld,et al.  Flexible Modeling of Epidemics with an Empirical Bayes Framework , 2014, PLoS Comput. Biol..

[15]  Ronald Rosenfeld,et al.  A human judgment approach to epidemiological forecasting , 2017, PLoS Comput. Biol..

[16]  Mauricio Santillana,et al.  Accurate estimation of influenza epidemics using Google search data via ARGO , 2015, Proceedings of the National Academy of Sciences.

[17]  Shane Greenstein,et al.  Economic Analysis of the Digital Economy , 2015 .

[18]  E. Brynjolfsson,et al.  The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales , 2013, ICIS 2013.

[19]  J. Shaman,et al.  Forecasting seasonal outbreaks of influenza , 2012, Proceedings of the National Academy of Sciences.

[20]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[21]  Vasileios Lampos,et al.  Google Searches Can Help Us Find Emerging Covid-19 Outbreaks , 2020 .

[22]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[23]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[24]  Marc Lipsitch,et al.  Inference of seasonal and pandemic influenza transmission dynamics , 2015, Proceedings of the National Academy of Sciences.

[25]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[26]  Jeffrey Shaman,et al.  Forecasting Influenza Outbreaks in Boroughs and Neighborhoods of New York City , 2016, PLoS Comput. Biol..

[27]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[28]  Marc Lipsitch,et al.  Improving the evidence base for decision making during a pandemic: the example of 2009 influenza A/H1N1. , 2011, Biosecurity and bioterrorism : biodefense strategy, practice, and science.