Dengue surveillance based on a computational model of spatio-temporal locality of Twitter

Twitter is a unique social media channel, in the sense that users discuss and talk about the most diverse topics, including their health conditions. In this paper we analyze how Dengue epidemic is reflected on Twitter and to what extent that information can be used for the sake of surveillance. Dengue is a mosquito-borne infectious disease that is a leading cause of illness and death in tropical and subtropical regions, including Brazil. We propose an active surveillance methodology that is based on four dimensions: volume, location, time and public perception. First we explore the public perception dimension by performing sentiment analysis. This analysis enables us to filter out content that is not relevant for the sake of Dengue surveillance. Then, we verify the high correlation between the number of cases reported by official statistics and the number of tweets posted during the same time period (i.e., R2 = 0.9578). A clustering approach was used in order to exploit the spatio-temporal dimension, and the quality of the clusters obtained becomes evident when they are compared to official data (i.e., RandIndex = 0.8914). As an application, we propose a Dengue surveillance system that shows the evolution of the dengue situation reported in tweets, which is implemented in www.observatorio.inweb.org.br/dengue/.

[1]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[2]  Mohammed J. Zaki,et al.  Lazy Associative Classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[4]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[5]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[6]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[7]  G. Eysenbach Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet , 2009, Journal of medical Internet research.

[8]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[11]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[12]  Heikki Mannila,et al.  The power of sampling in knowledge discovery , 1994, PODS '94.

[13]  Michael Marx,et al.  What does dengue disease surveillance contribute to predicting and detecting outbreaks and describing trends? , 2008, Tropical medicine & international health : TM & IH.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[16]  Benyuan Liu,et al.  Vision: towards real time epidemic vigilance through online social networks: introducing SNEFT -- social network enabled flu trends , 2010, MCS '10.

[17]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[18]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[19]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[20]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[21]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[22]  C. Dolea,et al.  World Health Organization , 1949, International Organization.

[23]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.