Why is it Difficult to Detect Sudden and Unexpected Epidemic Outbreaks in Twitter?

Social media services such as Twitter are a valuable source of information for decision support systems. Many studies have shown that this also holds for the medical domain, where Twitter is considered a viable tool for public health officials to sift through relevant information for the early detection, management, and control of epidemic outbreaks. This is possible due to the inherent capability of social media services to transmit information faster than traditional channels. However, the majority of current studies have limited their scope to the detection of common and seasonal health recurring events (e.g., Influenza-like Illness), partially due to the noisy nature of Twitter data, which makes outbreak detection and management very challenging. Within the European project M-Eco, we developed a Twitter-based Epidemic Intelligence (EI) system, which is designed to also handle a more general class of unexpected and aperiodic outbreaks. In particular, we faced three main research challenges in this endeavor: 1) dynamic classification to manage terminology evolution of Twitter messages, 2) alert generation to produce reliable outbreak alerts analyzing the (noisy) tweet time series, and 3) ranking and recommendation to support domain experts for better assessment of the generated alerts. In this paper, we empirically evaluate our proposed approach to these challenges using real-world outbreak datasets and a large collection of tweets. We validate our solution with domain experts, describe our experiences, and give a more realistic view on the benefits and issues of analyzing social media for public health.

[1]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[2]  Hisashi Kashima,et al.  Unsupervised Change Analysis Using Supervised Learning , 2008, PAKDD.

[3]  Avare Stewart,et al.  Unsupervised public health event detection for epidemic intelligence , 2010, CIKM.

[4]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[5]  D. Sculley,et al.  Combined regression and ranking , 2010, KDD.

[6]  Avare Stewart,et al.  Epidemic Intelligence for the Crowd, by the Crowd , 2012, ICWSM.

[7]  Lior Rokach,et al.  Recommender Systems Handbook , 2010 .

[8]  Wolfgang Nejdl,et al.  When in Doubt Ask the Crowd: Employing Crowdsourcing for Active Learning , 2014, WIMS '14.

[9]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[10]  Walid Magdy,et al.  Content and Network Dynamics Behind Egyptian Political Polarization on Twitter , 2014, CSCW.

[11]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[12]  Martin Szomszor,et al.  #swineflu , 2014, ACM Trans. Manag. Inf. Syst..

[13]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[14]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[15]  J. Brownstein,et al.  Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. , 2012, The American journal of tropical medicine and hygiene.

[16]  Avare Stewart,et al.  Towards personalized learning to rank for epidemic intelligence based on social media streams , 2012, WWW.

[17]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[18]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[19]  Peter Dolog,et al.  Making use of social media data in public health , 2012, WWW.

[20]  J. Brownstein,et al.  Twitter as a Sentinel in Emergency Situations: Lessons from the Boston Marathon Explosions , 2013, PLoS currents.

[21]  Nick Andrews,et al.  A Statistical Algorithm for the Early Detection of Outbreaks of Infectious Disease , 1996 .

[22]  Nello Cristianini,et al.  Nowcasting Events from the Social Web with Statistical Learning , 2012, TIST.

[23]  Gail M Williams,et al.  Internet-based surveillance systems for monitoring emerging infectious diseases , 2013, The Lancet Infectious Diseases.

[24]  L. Hutwagner,et al.  The bioterrorism preparedness and response Early Aberration Reporting System (EARS) , 2003, Journal of Urban Health.

[25]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[26]  Sunmoo Yoon,et al.  What can we learn about the Ebola outbreak from tweets? , 2015, American journal of infection control.

[27]  Elia Gabarron,et al.  Ebola, Twitter, and misinformation: a dangerous combination? , 2014, BMJ : British Medical Journal.

[28]  Mark Dredze,et al.  How Social Media Will Change Public Health , 2012, IEEE Intelligent Systems.

[29]  Nigel Collier,et al.  What's unusual in online disease outbreak news? , 2010, J. Biomed. Semant..

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[32]  Michael M. Wagner,et al.  Handbook of biosurveillance , 2006 .

[33]  A. Hulth,et al.  Practical usage of computer-supported outbreak detection in five European countries. , 2010, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[34]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[35]  Daniel B. Neill New Directions in Artificial Intelligence for Public Health Surveillance , 2012, IEEE Intelligent Systems.

[36]  Avare Stewart,et al.  Supporting temporal analytics for health-related events in microblogs , 2012, CIKM.

[37]  Emilia Mendes,et al.  Using tabu search to configure support vector regression for effort estimation , 2013, Empirical Software Engineering.