Spatio-temporal characteristics of bursty words in Twitter streams

Social networking and microblogging services such as Twitter provide a continuous source of data from which useful information can be extracted. The detection and characterization of bursty words play an important role in processing such data, as bursty words might hint to events or trending topics of social importance upon which actions can be triggered. While there are several approaches to extract bursty words from the content of messages, there is only little work that deals with the dynamics of continuous streams of messages, in particular messages that are geo-tagged. In this paper, we present a framework to identify bursty words from Twitter text streams and to describe such words in terms of their spatio-temporal characteristics. Using a time-aware word usage baseline, a sliding window approach over incoming tweets is proposed to identify words that satisfy some burstiness threshold. For these words then a time-varying, spatial signature is determined, which primarily relies on geo-tagged tweets. In order to deal with the noise and the sparsity of geo-tagged tweets, we propose a novel graph-based regularization procedure that uses spatial cooccurrences of bursty words and allows for computing sound spatial signatures. We evaluate the functionality of our online processing framework using two real-world Twitter datasets. The results show that our framework can efficiently and reliably extract bursty words and describe their spatio-temporal evolution over time.

[1]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[2]  Ling Chen,et al.  Event detection from flickr data through wavelet-based spatial analysis , 2009, CIKM.

[3]  Junjie Yao,et al.  EventSearch: a system for event discovery and retrieval on multi-type historical data , 2012, KDD.

[4]  Dimitrios Gunopulos,et al.  On burstiness-aware search for document sequences , 2009, KDD.

[5]  Hector Garcia-Molina,et al.  Overview of multidatabase transaction management , 2005, The VLDB Journal.

[6]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[7]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[8]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[9]  Jon M. Kleinberg,et al.  Spatial variation in search engine queries , 2008, WWW.

[10]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[11]  Nick Koudas,et al.  Identifying, attributing and describing spatial bursts , 2010, Proc. VLDB Endow..

[12]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[13]  Kazufumi Watanabe,et al.  Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs , 2011, CIKM '11.

[14]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[15]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[16]  D. Stott Parker,et al.  Topic dynamics: an alternative model of bursts in streams of topics , 2010, KDD.

[17]  Michael Gertz,et al.  Latent geographic feature extraction from social media , 2012, SIGSPATIAL/GIS.

[18]  Michael Gertz,et al.  Reliable Spatio-temporal Signal Extraction and Exploration from Human Activity Records , 2013, SSTD.

[19]  Dongman Lee,et al.  EventRadar: A Real-Time Local Event Detection Scheme Using Twitter Stream , 2012, 2012 IEEE International Conference on Green Computing and Communications.

[20]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[21]  Robert D. Nowak,et al.  Socioscope: Spatio-Temporal Signal Recovery from Social Media (Extended Abstract) , 2012, IJCAI.

[22]  Junjie Yao,et al.  Bursty event detection from collaborative tags , 2011, World Wide Web.

[23]  Dimitrios Gunopulos,et al.  On The Spatiotemporal Burstiness of Terms , 2012, Proc. VLDB Endow..

[24]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.