Nowcasting influenza outbreaks using open-source media reports

We construct and verify a statistical method to nowcast influenza activity from a time-series of the frequency of reports concerning influenza related topics. Such reports are published electronically by both public health organizations as well as newspapers/media sources, and thus can be harvested easily via web crawlers. Since media reports are timely, whereas reports from public health organization are delayed by at least two weeks, using timely, open-source data to compensate for the lag in %E2%80%9Cofficial%E2%80%9D reports can be useful. We use morbidity data from networks of sentinel physicians (both the Center of Disease Control's ILINet and France's Sentinelles network) as the gold standard of influenza-like illness (ILI) activity. The time-series of media reports is obtained from HealthMap (http://healthmap.org). We find that the time-series of media reports shows some correlation ( 0.5) with ILI activity; further, this can be leveraged into an autoregressive moving average model with exogenous inputs (ARMAX model) to nowcast ILI activity. We find that the ARMAX models have more predictive skill compared to autoregressive (AR) models fitted to ILI data i.e., it is possible to exploit the information content in the open-source data. We also find that when the open-source data are non-informative, the ARMAX modelsmore » reproduce the performance of AR models. The statistical models are tested on data from the 2009 swine-flu outbreak as well as the mild 2011-2012 influenza season in the U.S.A.« less

[1]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[2]  J. Brownstein,et al.  Digital disease detection--harnessing the Web for public health surveillance. , 2009, The New England journal of medicine.

[3]  Jussi Tolvi,et al.  Modeling Financial Time Series with S‐Plus , 2003 .

[4]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[5]  F. Mostashari,et al.  Monitoring over-the-counter medication sales for early detection of disease outbreaks--New York City. , 2005, MMWR supplements.

[6]  J. Shaman,et al.  Forecasting seasonal outbreaks of influenza , 2012, Proceedings of the National Academy of Sciences.

[7]  Farzad Mostashari,et al.  Evaluation of school absenteeism data for early outbreak detection, New York City , 2005, BMC public health.

[8]  A Mawudeku,et al.  Landscape of international event-based biosurveillance , 2010, Emerging health threats journal.

[9]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[10]  Aron Culotta,et al.  Detecting influenza outbreaks by analyzing Twitter messages , 2010, ArXiv.

[11]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[12]  Nello Cristianini,et al.  Nowcasting Events from the Social Web with Statistical Learning , 2012, TIST.

[13]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[14]  Aron Culotta,et al.  Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages , 2012, Language Resources and Evaluation.

[15]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[16]  Nigel Collier,et al.  Uncovering text mining: A survey of current work on web-based epidemic intelligence , 2012, Global public health.

[17]  Richard Platt,et al.  Telephone Triage Service Data for Detection of Influenza-Like Illness , 2009, PloS one.

[18]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[19]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[20]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[21]  Kuldeep Kumar,et al.  Modelling Financial Time Series with S‐PLUS , 2007 .

[22]  Wendy W. Chapman,et al.  Analysis of Web Access Logs for Surveillance of Influenza , 2004, MedInfo.

[23]  Emily H. Chan,et al.  Global capacity for emerging infectious disease detection , 2010, Proceedings of the National Academy of Sciences.

[24]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[25]  Benyuan Liu,et al.  Online Social Networks Flu Trend Tracker: A Novel Sensory Approach to Predict Flu Trends , 2012, BIOSTEC.

[26]  Helmut Luetkepohl,et al.  Econometric Analysis with Vector Autoregressive Models , 2007 .

[27]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[28]  Herman D. Tolentino,et al.  Use of Unstructured Event-Based Reports for Global Infectious Disease Surveillance , 2009, Emerging infectious diseases.

[29]  Emily H. Chan,et al.  Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance , 2011, PLoS neglected tropical diseases.

[30]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .