Spatiotemporal transformation of social media geostreams: a case study of Twitter for flu risk analysis

Georeferenced social media data streams (social media geostreams) are providing promising opportunities to gain new insights into spatiotemporal aspects of human interactions on cyber space and their relation with real-world activities. In particular, such opportunities are motivating public health researchers to improve the surveillance of disease epidemics by means of spatiotemporal analysis of social media geostreams. One essential requirement in achieving such geostream-based disease surveillance is to establish scalable data infrastructures capable of real-time transformation of massive geostreams into spatiotemporally organized data to which analytical methods are readily applicable. To fulfill this requirement, this study develops a data pipeline solution where multiple computational components are integrated to collect, process, and aggregate social media geostreams in near real time. As a test case, this solution focuses on one well-known social media geostream, the Twitter data stream, and one type of disease epidemics, the flu. The pipeline solution facilitates multiscale spatiotemporal analysis of flu risks by collecting geotagged tweets from the Twitter Streaming API, identifying flu-related tweets through keyword match, aggregating tweets at multiple spatial granularities in near real time, and storing tweets and the aggregate statistics in a distributed NoSQL database. Although developed for the surveillance of flu epidemics, the pipeline would serve as a general framework for building scalable data infrastructures that can support real-time spatiotemporal analysis of social media geostreams in the application domains beyond disease mapping and public health.

[1]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[2]  Joseph S. Lombardo Disease Surveillance: a Public Health Informatics Approach , 2008 .

[3]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[4]  Shaowen Wang,et al.  A CyberGIS Environment for Analysis of Location-Based Social Media Data , 2013 .

[5]  R. Guha,et al.  What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and Geographic Information System , 2013, Cartography and geographic information science.

[6]  S. Magruder,et al.  Comparison of office visit and nurse advice hotline data for syndromic surveillance--Baltimore-Washington, D.C., metropolitan area, 2002. , 2004, MMWR supplements.

[7]  John H. Schuenemeyer Applied Spatial Statistics for Public Health Data, L.A. Waller, C.A. Gotway. Wiley-Interscience, New York (2004), (520pp., US$95, hardcover), ISBN: 0-471-38771-1 , 2005 .

[8]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.

[9]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[10]  Benyuan Liu,et al.  Vision: towards real time epidemic vigilance through online social networks: introducing SNEFT -- social network enabled flu trends , 2010, MCS '10.

[11]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[12]  I. Newlands,et al.  Innovation in observation: a vision for early outbreak detection , 2010, Emerging health threats journal.

[13]  Michael M. Wagner,et al.  Technical Description of RODS: A Real-time Public Health Surveillance System , 2003, Journal of the American Medical Informatics Association.

[14]  Anne Laurent,et al.  Reduce, You Say: What NoSQL Can Do for Data Aggregation and BI in Large Repositories , 2011, 2011 22nd International Workshop on Database and Expert Systems Applications.

[15]  Son Doan,et al.  Syndromic Classification of Twitter Messages , 2011, eHealth.

[16]  Patty Kostkova,et al.  Early Warning and Outbreak Detection Using Social Networking Websites: The Potential of Twitter , 2009, eHealth.

[17]  Ming-Hsiang Tsou,et al.  Visualization of social media: seeing a mirage or a message? , 2013 .

[18]  Shaowen Wang,et al.  FluMapper: an interactive CyberGIS environment for massive location-based social media data analysis , 2013, XSEDE.

[19]  Richard Platt,et al.  Use of Automated Ambulatory-Care Encounter Records for Detection of Acute Illness Clusters, Including Potential Bioterrorism Events , 2002, Emerging infectious diseases.

[20]  Geoffrey M. Jacquez,et al.  Design and implementation of a Space-Time Intelligence System for disease surveillance , 2005, J. Geogr. Syst..

[21]  M. Goodchild,et al.  Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr , 2013 .

[22]  Thomas Liebig,et al.  Visual Analytics for Understanding Spatial Situations from Episodic Movement Data , 2012, KI - Künstliche Intelligenz.

[23]  Thomas Ertl,et al.  Spatiotemporal anomaly detection through visual analysis of geolocated Twitter messages , 2012, 2012 IEEE Pacific Visualization Symposium.

[24]  C. Bridges,et al.  The annual impact of seasonal influenza in the US: measuring disease burden and costs. , 2007, Vaccine.

[25]  Herman D. Tolentino,et al.  Use of Unstructured Event-Based Reports for Global Infectious Disease Surveillance , 2009, Emerging infectious diseases.

[26]  W. Chapman,et al.  Syndrome and outbreak detection using chief-complaint data--experience of the Real-Time Outbreak and Disease Surveillance project. , 2004, MMWR supplements.

[27]  S. McLafferty,et al.  GIS and Public Health , 2002 .

[28]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[29]  Geoff Holmes,et al.  MOA-TweetReader: Real-Time Analysis in Twitter Streaming Data , 2011, Discovery Science.

[30]  E. Naumova,et al.  Dynamic maps: a visual-analytic methodology for exploring spatio-temporal disease patterns , 2009, Environmental health : a global access science source.

[31]  S. Magruder,et al.  Progress in understanding and using over-the-counter pharmaceuticals for syndromic surveillance. , 2004, MMWR supplements.

[32]  Lon Safko,et al.  The Social Media Bible: Tactics, Tools, and Strategies for Business Success , 2009 .