Flu Gone Viral: Syndromic Surveillance of Flu on Twitter Using Temporal Topic Models

Surveillance of epidemic outbreaks and spread from social media is an important tool for governments and public health authorities. Machine learning techniques for now casting the flu have made significant inroads into correlating social media trends to case counts and prevalence of epidemics in a population. There is a disconnect between data-driven methods for forecasting flu incidence and epidemiological models that adopt a state based understanding of transitions, that can lead to sub-optimal predictions. Furthermore, models for epidemiological activity and social activity like on Twitter predict different shapes and have important differences. We propose a temporal topic model to capture hidden states of a user from his tweets and aggregate states in a geographical region for better estimation of trends. We show that our approach helps fill the gap between phenomenological methods for disease surveillance and epidemiological models. We validate this approach by modeling the flu using Twitter in multiple countries of South America. We demonstrate that our model can consistently outperform plain vocabulary assessment in flu case-count predictions, and at the same time get better flu-peak predictions than competitors. We also show that our fine-grained modeling can reconcile some contrasting behaviors between epidemiological and social models.

[1]  N. Christakis,et al.  Social Network Sensors for Early Detection of Contagious Outbreaks , 2010, PloS one.

[2]  Herbert W. Hethcote,et al.  The Mathematics of Infectious Diseases , 2000, SIAM Rev..

[3]  Alok N. Choudhary,et al.  Real-time disease surveillance using Twitter data: demonstration on flu and cancer , 2013, KDD.

[4]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[5]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.

[6]  Michael Y. Li,et al.  Global stability for the SEIR model in epidemiology. , 1995, Mathematical biosciences.

[7]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[8]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[9]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[10]  Madhav V. Marathe,et al.  Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions , 2014, SDM.

[11]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[12]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[13]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[14]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[15]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[16]  Brian D. Davison,et al.  Tracking trends: incorporating term volume into temporal topic models , 2011, KDD.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Christos Faloutsos,et al.  Rise and fall patterns of information diffusion: model and implications , 2012, KDD.

[19]  Henry A. Kautz,et al.  Predicting Disease Transmission from Geo-Tagged Micro-Blog Data , 2012, AAAI.

[20]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[21]  Jure Leskovec,et al.  Finding progression stages in time-evolving event sequences , 2014, WWW.

[22]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[23]  Gabriella Vigliocco,et al.  The Hidden Markov Topic Model: A Probabilistic Model of Semantic Representation , 2010, Top. Cogn. Sci..

[24]  Yasuhiro Takeuchi,et al.  Global stability of an SIR epidemic model with time delays , 1995, Journal of mathematical biology.

[25]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[26]  J A Jacquez,et al.  The stochastic SI model with recruitment and deaths. I. Comparison with the closed SIS model. , 1993, Mathematical biosciences.