Mining spatio-temporal information on microblogging streams using a density-based online clustering method

Highlights? We applied a density-based stream clustering method for mining Twitter data. ? The developed method can detect real-time and geospatial event features. ? Using the detection results can estimate the temporal and spatial impacts of events. ? Our method is well suited for awareness of large-scale events and risk management. Social networks have been regarded as a timely and cost-effective source of spatio-temporal information for many fields of application. However, while some research groups have successfully developed topic detection methods from the text streams for a while, and even some popular microblogging services such as Twitter did provide information of top trending topics for selection, it is still unable to fully support users for picking up all of the real-time event topics with a comprehensive spatio-temporal viewpoint to satisfy their information needs. This paper aims to investigate how microblogging social networks (i.e. Twitter) can be used as a reliable information source of emerging events by extracting their spatio-temporal features from the messages to enhance event awareness. In this work, we applied a density-based online clustering method for mining microblogging text streams, in order to obtain temporal and geospatial features of real-world events. By analyzing the events detected by our system, the temporal and spatial impacts of the emerging events can be estimated, for achieving the goals of situational awareness and risk management.

[1]  Albert Bifet,et al.  Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams , 2010, Frontiers in Artificial Intelligence and Applications.

[2]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[3]  Hila Becker,et al.  Hip and trendy: Characterizing emerging trends on Twitter , 2011, J. Assoc. Inf. Sci. Technol..

[4]  Chung-Hong Lee,et al.  BursT: A Dynamic Term Weighting Scheme for Mining Microblogging Messages , 2011, ISNN.

[5]  Shi Zhong,et al.  Efficient streaming text clustering , 2005, Neural Networks.

[6]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[7]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[8]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[9]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[10]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[11]  Hiroyuki Kitagawa,et al.  A Novelty-based Clustering Method for On-line Documents , 2008, World Wide Web.

[12]  Shuangyong Song,et al.  A Spatio-temporal Framework for Related Topic Search in Micro-Blogging , 2010, AMT.

[13]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Mitsuru Ishizuka,et al.  Topic extraction from news archive using TF*PDF algorithm , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[15]  Madjid Khalilian,et al.  Data Stream Clustering: Challenges and Issues , 2010, ArXiv.

[16]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[17]  Kazutoshi Sumiya,et al.  Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection , 2010, LBSN '10.

[18]  Kazutoshi Sumiya,et al.  Discovery of unusual regional social activities using geo-tagged microblogs , 2011, World Wide Web.

[19]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[20]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[21]  Aixin Sun,et al.  Query-Guided Event Detection From News and Blog Streams , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[22]  Bertrand De Longueville,et al.  "OMG, from here, I can see the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires , 2009, LBSN '09.

[23]  Isamu Shioya,et al.  Giving Temporal Order to News Corpus , 2004, CIS.

[24]  Durga Toshniwal,et al.  Clustering Unstructured Text Documents Using Fading Function , 2009 .

[25]  Kiem Hoang,et al.  Incremental Document Clustering Based on Graph Model , 2009, ADMA.

[26]  Franco Zambonelli,et al.  Social sensors and pervasive services: Approaches and perspectives , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[27]  Marc Cheong,et al.  Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base , 2009, CIKM-SWSM.

[28]  Shi Zhong,et al.  Efficient online spherical k-means clustering , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[29]  Hsin-Chang Yang,et al.  DBHTE: A Novel Algorithm for Extracting Real-time Microblogging Topics , 2010, CAINE.

[30]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[31]  Hiroyuki Kitagawa,et al.  An On-Line Document Clustering Method Based on Forgetting Factors , 2001, ECDL.