Towards Exploiting Social Networks for Detecting Epidemic Outbreaks

Social networks are becoming a valuable source of information for applications in many domains. In particular, many studies have highlighted the potential of social networks for early detection of epidemic outbreaks, due to their capability to transmit information faster than traditional channels, thus leading to quicker reactions of public health officials. Anyhow, the most of these studies have investigated only one or two diseases, and consequently to date there is no study in the literature trying to investigate if and how different kinds of outbreaks may lead to different temporal dynamics of the messages exchanged over social networks. Furthermore, in case of a wide variability, it is not clear if it would be possible to define a single generic solution able to detect multiple epidemic outbreaks, or if specifically tailored approaches should be implemented for each disease. To get an insight into these open points, we collected a massive dataset, containing more than one hundred million Twitter messages from different countries, looking for those relevant for an early outbreak detection of multiple disease. The collected results highlight that there is a significant variability in the temporal patterns of Twitter messages among different diseases. In this paper, we report on the main findings of this analysis, and we propose a set of steps to exploit social networks for early epidemic outbreaks, including a proper document model for the outbreaks, a Graphical User Interface for the public health officials, and the identification of suitable sources of information useful as ground truth for the assessment of outbreak detection algorithms.

[1]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[2]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[3]  Ernesto Diaz-Aviles,et al.  Tracking Twitter for epidemic intelligence: case study: EHEC/HUS outbreak in Germany, 2011 , 2012, WebSci '12.

[4]  Susan T. Dumais,et al.  Understanding temporal query dynamics , 2011, WSDM '11.

[5]  Andreas Pitsillides,et al.  The practice of online social networking of the physical world , 2012, Int. J. Space Based Situated Comput..

[6]  Daniel B. Neill New Directions in Artificial Intelligence for Public Health Surveillance , 2012, IEEE Intelligent Systems.

[7]  Flora Amato,et al.  A Semantic-based Document Processing Framework: A Security Perspective , 2011, 2011 International Conference on Complex, Intelligent, and Software Intensive Systems.

[8]  Alok N. Choudhary,et al.  Real-time disease surveillance using Twitter data: demonstration on flu and cancer , 2013, KDD.

[9]  Avare Stewart,et al.  Supporting temporal analytics for health-related events in microblogs , 2012, CIKM.

[10]  Gerhard Weikum,et al.  A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.

[11]  Emilia Mendes,et al.  Using tabu search to configure support vector regression for effort estimation , 2013, Empirical Software Engineering.

[12]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[13]  Gail M Williams,et al.  Internet-based surveillance systems for monitoring emerging infectious diseases , 2013, The Lancet Infectious Diseases.

[14]  Maytham Safar,et al.  Correlating feedback capacity with degree of diffusion in heterogeneous complex networks , 2013, Int. J. Space Based Situated Comput..

[15]  Milad Shokouhi,et al.  Detecting seasonal queries by time-series analysis , 2011, SIGIR.

[16]  Wolfgang Nejdl,et al.  Challenges in Detecting Epidemic Outbreaks from Social Networks , 2016, 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA).

[17]  Adam Jatowt,et al.  Temporal Ranking of Search Engine Results , 2005, WISE.

[18]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[19]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[20]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.

[21]  Michael Gertz,et al.  Identification of top relevant temporal expressions in documents , 2012, TempWeb '12.

[22]  Jiaqing Lin,et al.  A Touch Screen Interface Design with Tactile Feedback , 2011, 2011 International Conference on Complex, Intelligent, and Software Intensive Systems.

[23]  Nick Feamster,et al.  #bias: Measuring the Tweeting Behavior of Propagandists , 2012, ICWSM.

[24]  Fernando Diaz,et al.  Using temporal profiles of queries for precision prediction , 2004, SIGIR '04.

[25]  Nello Cristianini,et al.  Nowcasting Events from the Social Web with Statistical Learning , 2012, TIST.

[26]  Nigel Collier,et al.  OMG U got flu? Analysis of shared health messages for bio-surveillance , 2011, Semantic Mining in Biomedicine.

[27]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[28]  Sharib A. Khan Handbook of Biosurveillance, M.M. Wagner, A.W. Moore, R.M. Aryel (Eds.). Elsevier Inc. ISBN-13: 978-0-12-369378-5 , 2007, J. Biomed. Informatics.

[29]  Michael Gertz,et al.  An event-centric model for multilingual document similarity , 2011, SIGIR '11.

[30]  Martin Szomszor,et al.  #Swineflu: Twitter Predicts Swine Flu Outbreak in 2009 , 2010, eHealth.

[31]  Michael M. Wagner,et al.  Handbook of biosurveillance , 2006 .

[32]  Flora Amato,et al.  A semantic approach for fine-grain access control of e-health documents , 2013, Log. J. IGPL.

[33]  A. Hulth,et al.  Practical usage of computer-supported outbreak detection in five European countries. , 2010, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[34]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[35]  Michela Bertolotto,et al.  Integrating Google Earth within OLAP Tools for Multidimensional Exploration and Analysis of Spatial Data , 2009, ICEIS.

[36]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[37]  Susan T. Dumais,et al.  Leveraging temporal dynamics of document content in relevance ranking , 2010, WSDM '10.

[38]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[39]  J. Brownstein,et al.  Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. , 2012, The American journal of tropical medicine and hygiene.

[40]  Paola Velardi,et al.  Twitter mining for fine-grained syndromic surveillance , 2014, Artif. Intell. Medicine.

[41]  Avare Stewart,et al.  Towards personalized learning to rank for epidemic intelligence based on social media streams , 2012, WWW.

[42]  G. Eysenbach Infodemiology: The epidemiology of (mis)information. , 2002, The American journal of medicine.

[43]  J. Brownstein,et al.  Twitter as a Sentinel in Emergency Situations: Lessons from the Boston Marathon Explosions , 2013, PLoS currents.

[44]  Nick Andrews,et al.  A Statistical Algorithm for the Early Detection of Outbreaks of Infectious Disease , 1996 .

[45]  Fuchun Peng,et al.  Improving search relevance for implicitly temporal queries , 2009, SIGIR.

[46]  Son Doan,et al.  Syndromic Classification of Twitter Messages , 2011, eHealth.

[47]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[48]  Roi Blanco,et al.  Ranking related news predictions , 2011, SIGIR.

[49]  Dennis KM Ip,et al.  A profile of the online dissemination of national influenza surveillance data , 2009, BMC public health.