Sense and Focus: Towards Effective Location Inference and Event Detection on Twitter

Twitter users post observations about their immediate environment as a part of the 500 million tweets posted everyday. As such, Twitter can become the source for invaluable information about objects, locations, and events, which can be analyzed and monitored in real time, not only to understand what is happening in the world, but also an event's exact location. However, Twitter data is noisy as sensory values, and information such as the location of a tweet may not be available, e.g., only 0.9i¾?% of tweets have GPS data. Due to the lack of accurate and fine-grained location information, existing Twitter event monitoring systems focus on city-level or coarser location identification, which cannot provide details for local events. In this paper, we propose SNAF Sense and Focus, an event monitoring system for Twitter data that emphasizes local events. We increase the availability of the location information significantly by finding locations in tweet messages and users' past tweets. We apply data cleaning techniques in our system, and with extensive experiments, we show that our method can improve the accuracy of location inference by 5i¾?% to 20i¾?% across different error ranges. We also show that our prototype implementation of SNAF can identify critical local events in real time, in many cases earlier than news reports.

[1]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[2]  Chenliang Li,et al.  Fine-grained location extraction from tweets with temporal awareness , 2014, SIGIR.

[3]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[4]  Mohamed A. Sharaf,et al.  Emerging event detection in social networks with location sensitivity , 2014, World Wide Web.

[5]  Bo Sheng,et al.  Outlier detection in sensor networks , 2007, MobiHoc '07.

[6]  Amber E. Boydstun,et al.  RTextTools: A Supervised Learning Package for Text Classification , 2013, R J..

[7]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[8]  Joemon M. Jose,et al.  Building a large-scale corpus for evaluating event detection on twitter , 2013, CIKM.

[9]  Jie Yin,et al.  Location extraction from disaster-related microblogs , 2013, WWW.

[10]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[11]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[12]  Yutaka Matsuo,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Michiaki Tatsubori,et al.  Location inference using microblog messages , 2012, WWW.

[14]  Mohamed A. Sharaf,et al.  Predicting Elections from Social Networks Based on Sub-event Detection and Sentiment Analysis , 2014, WISE.

[15]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  Alice M. Agogino,et al.  Fuzzy Validation and Fusion for Wireless Sensor Networks , 2004 .

[17]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[18]  Laks V. S. Lakshmanan,et al.  Incremental cluster evolution tracking from highly dynamic network data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[19]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[20]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[21]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[22]  Ran Wolff,et al.  In-Network Outlier Detection in Wireless Sensor Networks , 2006, ICDCS.

[23]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[24]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.