Predicting Flu Trends using Twitter data

Reducing the impact of seasonal influenza epidemics and other pandemics such as the H1N1 is of paramount importance for public health authorities. Studies have shown that effective interventions can be taken to contain the epidemics if early detection can be made. Traditional approach employed by the Centers for Disease Control and Prevention (CDC) includes collecting influenza-like illness (ILI) activity data from “sentinel” medical practices. Typically there is a 1–2 week delay between the time a patient is diagnosed and the moment that data point becomes available in aggregate ILI reports. In this paper we present the Social Network Enabled Flu Trends (SNEFT) framework, which monitors messages posted on Twitter with a mention of flu indicators to track and predict the emergence and spread of an influenza epidemic in a population. Based on the data collected during 2009 and 2010, we find that the volume of flu related tweets is highly correlated with the number of ILI cases reported by CDC. We further devise auto-regression models to predict the ILI activity level in a population. The models predict data collected and published by CDC, as the percentage of visits to “sentinel” physicians attributable to ILI in successively weeks. We test models with previous CDC data, with and without measures of Twitter data, showing that Twitter data can substantially improve the models prediction accuracy. Therefore, Twitter data provides real-time assessment of ILI activity.

[1]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[2]  Michael M. Wagner,et al.  Telephone Triage: A Timely Data Source for Surveillance of Influenza-like Diseases , 2003, AMIA.

[3]  Benyuan Liu,et al.  Vision: towards real time epidemic vigilance through online social networks: introducing SNEFT -- social network enabled flu trends , 2010, MCS '10.

[4]  A. Nizam,et al.  Containing Pandemic Influenza at the Source , 2005, Science.

[5]  D. Cummings,et al.  Strategies for containing an emerging influenza pandemic in Southeast Asia , 2005, Nature.

[6]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[7]  Donald F. Towsley,et al.  Measurement and gender-specific analysis of user publishing characteristics on MySpace , 2010, IEEE Network.

[8]  Richard Platt,et al.  Use of Automated Ambulatory-Care Encounter Records for Detection of Acute Illness Clusters, Including Potential Bioterrorism Events , 2002, Emerging infectious diseases.

[9]  S. Magruder Evaluation of Over-the-Counter Pharmaceutical Sales As a Possible Early Warning Indicator of Human Disease , 2003 .

[10]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[11]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[12]  Stefan Savage,et al.  Measuring Online Service Availability Using Twitter , 2010, WOSN.

[13]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[14]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.