Spatial-Temporal Event Detection from Geo-Tagged Tweets

As one of the most popular social networking services in the world, Twitter allows users to post messages along with their current geographic locations. Such georeferenced or geo-tagged Twitter datasets can benefit location-based services, targeted advertising and geosocial studies. Our study focused on the detection of small-scale spatial-temporal events and their textual content. First, we used Spatial-Temporal Density-Based Spatial Clustering of Applications with Noise (ST-DBSCAN) to spatially-temporally cluster the tweets. Then, the word frequencies were summarized for each cluster and the potential topics were modeled by the Latent Dirichlet Allocation (LDA) algorithm. Using two years of Twitter data from four college cities in the U.S., we were able to determine the spatial-temporal patterns of two known events, two unknown events and one recurring event, which then were further explored and modeled to identify the semantic content about the events. This paper presents our process and recommendations for both finding event-related tweets as well as understanding the spatial-temporal behaviors and semantic natures of the detected events.

[1]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[2]  Yaneer Bar-Yam,et al.  Global patterns of synchronization in human communications , 2017, Journal of The Royal Society Interface.

[3]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[4]  Yueshen Xu,et al.  Topic Model , 2014, Encyclopedia of Social Network Analysis and Mining.

[5]  T. Murata,et al.  Breaking News Detection and Tracking in Twitter , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Ilyoung Hong,et al.  What Is So "Hot" in Heatmap?: Qualitative Code Cluster Analysis with Foursquare Venue , 2017, Cartogr. Int. J. Geogr. Inf. Geovisualization.

[8]  Regina Barzilay,et al.  Event Discovery in Social Media Feeds , 2011, ACL.

[9]  G. Miller Sociology. Social scientists wade into the tweet stream. , 2011, Science.

[10]  Forrest R. Stevens,et al.  Improving Large Area Population Mapping Using Geotweet Densities , 2016, Trans. GIS.

[11]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Jason I. Hong,et al.  State of the Geotags: Motivations and Recent Changes , 2017, ICWSM.

[14]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[15]  Ming Wen,et al.  Geotagged US Tweets as Predictors of County-Level Health Outcomes, 2015-2016 , 2017, American journal of public health.

[16]  Daniel Kifer,et al.  Predicting Demographics of High-Resolution Geographies with Geotagged Tweets , 2017, AAAI.

[17]  M. Kulldorff A spatial scan statistic , 1997 .

[18]  Ross Purves,et al.  Twitter location (sometimes) matters: Exploring the relationship between georeferenced tweet content and nearby feature classes , 2014, J. Spatial Inf. Sci..

[19]  Derek Ruths,et al.  Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice , 2015, ICWSM.

[20]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[21]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[22]  Kazutoshi Sumiya,et al.  Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection , 2010, LBSN '10.

[23]  Yutaka Matsuo,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013, IEEE Transactions on Knowledge and Data Engineering.

[24]  Isabelle Simon,et al.  Saint-Patrick's Day , 2012 .

[25]  Jürgen Pfeffer,et al.  Population Bias in Geotagged Tweets , 2015, Proceedings of the International AAAI Conference on Web and Social Media.

[26]  David W. S. Wong,et al.  Modeling and Visualizing Regular Human Mobility Patterns with Uncertainty: An Example Using Twitter Data , 2015 .

[27]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[28]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[29]  Tonglin Zhang,et al.  Spatial Scan Statistics , 2013 .

[30]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[31]  Constantinos Antoniou,et al.  Inferring Activities from Social Media Data , 2017 .

[32]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[33]  Luke S Sloan,et al.  Who Tweets with Their Location? Understanding the Relationship between Demographic Characteristics and the Use of Geoservices and Geotagging on Twitter , 2015, PloS one.

[34]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[35]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[36]  Mei-Po Kwan,et al.  Algorithmic Geographies: Big Data, Algorithmic Uncertainty, and the Production of Geographic Knowledge , 2016, Geographies of Mobility.

[37]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[38]  Maximilian Walther,et al.  Geo-spatial Event Detection in the Twitter Stream , 2013, ECIR.

[39]  Xia Feng,et al.  Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey , 2017, Multimedia Tools and Applications.

[40]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[41]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[42]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[43]  Martin Grandjean,et al.  A social network analysis of Twitter: Mapping the digital humanities community , 2016 .

[44]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[45]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[46]  Qinghua Li,et al.  Discover Patterns and Mobility of Twitter Users - A Study of Four US College Cities , 2017, ISPRS Int. J. Geo Inf..

[47]  Shaowen Wang,et al.  Mapping the global Twitter heartbeat: The geography of Twitter , 2013, First Monday.

[48]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[49]  Krys J. Kochut,et al.  A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques , 2017, ArXiv.

[50]  Matthew Zook,et al.  Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb , 2013 .