Identification and Validation of Real-Time Health Events through Social Media

Twitter, the popular microblogging platform, has more than five hundred million registered users (Tweeters). These Tweeters generate a large amount of information every day. A big challenge and opportunity that is explored in this paper is to use this information to analyse health events -- ideally in real-time. Such real time information is essential for outbreaks of disease and identifying where and who might be affected. In this context however it is essential to verify that the information is accurate and can be compared with other data sources. This paper presents a methodology and infrastructure delivering such capabilities. Unlike other approaches that have been on a small scale, this work exploits large-scale Cloud facilities and much larger collections of data. Specifically, we collected and analysed over 46 million tweets from the three most populated cities in Australia (Sydney, Melbourne and Brisbane) to find patterns related to health events. Five diseases were explored: ebola, dengue fever, flu, H1N1 and hayfever, however the platform can be used for other disease areas. We compared and validated the results with Google Trends data as well as data from the Australian Institute of Health and Welfare. We identified a high and measurable correlation between our data and these other sources. Building on these quantifiable degrees of accuracy, we suggest that social media can indeed be a key approach to alert authorities and the population at large of health disease events, e.g. pandemics, and allow them to track disease spread. At present no such infrastructure or capability exists.

[1]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[2]  Soon Ae Chun,et al.  Monitoring Public Health Concerns Using Twitter Sentiment Classifications , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[3]  Yanggon Kim,et al.  Automated Twitter data collecting tool for data mining in social network , 2012, RACS.

[4]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[5]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[6]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[7]  Chang-Tien Lu,et al.  Misinformation Propagation in the Age of Twitter , 2014, Computer.

[8]  Son Doan,et al.  Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[9]  Dehghantanha Ali Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, Github, and More, by Matthew A. Russell , 2015 .

[10]  Hideo Hirose,et al.  Prediction of Infectious Disease Spread Using Twitter: A Case of Influenza , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[11]  Konstantinos Chorianopoulos,et al.  Real-Time Monitoring of Flu Epidemics through Linguistic and Statistical Analysis of Twitter Messages , 2014, 2014 9th International Workshop on Semantic and Social Media Adaptation and Personalization.

[12]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[13]  Connie St Louis,et al.  Can Twitter predict disease outbreaks? , 2012, BMJ : British Medical Journal.

[14]  Kaoru Sezaki,et al.  A robust and scalable framework for detecting self-reported illness from twitter , 2012, 2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom).

[15]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[16]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .