Early Stage Influenza Detection from Twitter

Influenza is an acute respiratory illness that occurs virtually every year and results in substantial disease, death and expense. Detection of Influenza in its earliest stage would facilitate timely action that could reduce the spread of the illness. Existing systems such as CDC and EISS which try to collect diagnosis data, are almost entirely manual, resulting in about two-week delays for clinical data acquisition. Twitter, a popular microblogging service, provides us with a perfect source for early-stage flu detection due to its real- time nature. For example, when a flu breaks out, people that get the flu may post related tweets which enables the detection of the flu breakout promptly. In this paper, we investigate the real-time flu detection problem on Twitter data by proposing Flu Markov Network (Flu-MN): a spatio-temporal unsupervised Bayesian algorithm based on a 4 phase Markov Network, trying to identify the flu breakout at the earliest stage. We test our model on real Twitter datasets from the United States along with baselines in multiple applications, such as real-time flu breakout detection, future epidemic phase prediction, or Influenza-like illness (ILI) physician visits. Experimental results show the robustness and effectiveness of our approach. We build up a real time flu reporting system based on the proposed approach, and we are hopeful that it would help government or health organizations in identifying flu outbreaks and facilitating timely actions to decrease unnecessary mortality.

[1]  Y. Moreno,et al.  Epidemic outbreaks in complex heterogeneous networks , 2001, cond-mat/0107267.

[2]  David Madigan,et al.  Bayesian Data Mining for Health Surveillance , 2005 .

[3]  G. Bateson,et al.  STEPS TO AN ECOLOGY OF MIND COLLECTED ESSAYS IN ANTHROPOLOGY, PSYCHIATRY, EVOLUTION, AND EPISTEMOLOGY , 2006 .

[4]  Antonio López-Quílez,et al.  Bayesian Markov switching models for the early detection of influenza epidemics , 2008, Statistics in medicine.

[5]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[6]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[7]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[8]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[9]  R. Serfling Methods for current statistical analysis of excess pneumonia-influenza deaths. , 1963, Public health reports.

[10]  Thomas Caraco,et al.  Population dispersion and equilibrium infection frequency in a spatial epidemic , 1999 .

[11]  O. Weiland,et al.  An epidemic outbreak of hepatitis A among homosexual men in Stockholm. Hepatitis A, a special hazard for the male homosexual subpopulation in Sweden. , 1982, American journal of epidemiology.

[12]  Andrew B. Lawson,et al.  Spatial and syndromic surveillance for public health , 2005 .

[13]  Bradley P. Carlin,et al.  Hierarchical Spatio-Temporal Mapping of Disease Rates , 1997 .

[14]  Nigel Collier,et al.  Uncovering text mining: A survey of current work on web-based epidemic intelligence , 2012, Global public health.

[15]  F. Carrat,et al.  Monitoring epidemiologic surveillance data using hidden Markov models. , 1999, Statistics in medicine.

[16]  Noel Cressie,et al.  Hierarchical statistical modelling of influenza epidemic dynamics in space and time , 2002, Statistics in medicine.

[17]  Paola Sebastiani,et al.  Automated Detection of Influenza Epidemics with Hidden Markov Models , 2003, IDA.

[18]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[19]  A. Galston Fundamentals of Ecology , 1972, The Yale Journal of Biology and Medicine.

[20]  Regina Barzilay,et al.  Event Discovery in Social Media Feeds , 2011, ACL.

[21]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[22]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[23]  E. Odum Fundamentals of ecology , 1972 .

[24]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[25]  D. Hunter,et al.  Bayesian Inference for Contact Networks Given Epidemic Data , 2010 .

[26]  Soon Ae Chun,et al.  Epidemic Outbreak and Spread Detection System Based on Twitter Data , 2012, HIS.

[27]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[28]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[29]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[30]  Henry A. Kautz,et al.  Modeling Spread of Disease from Social Interactions , 2012, ICWSM.

[31]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[32]  Benyuan Liu,et al.  Twitter Improves Seasonal Influenza Prediction , 2018, HEALTHINF.

[33]  Michael Lavine,et al.  A Markov random field spatio-temporal analysis of ocean temperature , 1999, Environmental and Ecological Statistics.

[34]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[35]  Michael M. Wagner,et al.  Telephone Triage: A Timely Data Source for Surveillance of Influenza-like Diseases , 2003, AMIA.

[36]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[37]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[38]  J. Hyman,et al.  Estimation of the reproduction number of dengue fever from spatial epidemic data. , 2007, Mathematical biosciences.

[39]  P. Sebastiani,et al.  A Bayesian dynamic model for influenza surveillance , 2006, Statistics in medicine.

[40]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[41]  Christoph Aschwanden,et al.  Spatial simulation model for infectious viral diseases with focus on SARS and the common flu , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[42]  M. Osborne,et al.  Using Prediction Markets and Twitter to Predict a Swine Flu Pandemic , 2009 .

[43]  Wendy W. Chapman,et al.  Analysis of Web Access Logs for Surveillance of Influenza , 2004, MedInfo.

[44]  A. Gelfand,et al.  Hierarchical Spatio-Temporal Mapping of Disease , 2007 .

[45]  S. Magruder Evaluation of Over-the-Counter Pharmaceutical Sales As a Possible Early Warning Indicator of Human Disease , 2003 .

[46]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[47]  Herbert W. Hethcote,et al.  Epidemic models: Their structure and relation to data , 1996 .