SNAF: Observation filtering and location inference for event monitoring on twitter

Twitter has recently emerged as a popular microblogging service that has 284 million monthly active users around the world. A part of the 500 million tweets posted on Twitter everyday are personal observations of immediate environment. If provided with time and location information, these observations can be seen as sensory readings for monitoring and localizing objects and events of interests. Location information on Twitter, however, is scarce, with less than 1% of tweets have associated GPS coordinates. Current researches on Twitter location inference mostly focus on city-level or coarser inference, and cannot provide accurate results for fine-grained locations. We propose an event monitoring system for Twitter that emphasizes local events, called SNAF (Sense and Focus). The system filters personal observations posted on Twitter and infers location of each report. Our extensive experiments with real Twitter data show that, the proposed observation filtering approach can have about 22% improvement over existing filtering techniques, and our location inference approach can increase the location accuracy by up to 36% within the 3km error range. By aggregating the observation reports with location information, our prototype event monitoring system can detect real world events, in many case earlier than news reports.

[1]  Kyomin Jung,et al.  Prominent Features of Rumor Propagation in Online Social Media , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[3]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[4]  Fernando Diaz,et al.  CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises , 2014, ICWSM.

[5]  Chenliang Li,et al.  Fine-grained location extraction from tweets with temporal awareness , 2014, SIGIR.

[6]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.

[7]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[8]  Quan Z. Sheng,et al.  Improving Object and Event Monitoring on Twitter Through Lexical Analysis and User Profiling , 2016, WISE.

[9]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[10]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[11]  Alice M. Agogino,et al.  Fuzzy Validation and Fusion for Wireless Sensor Networks , 2004 .

[12]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[13]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[14]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[15]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[17]  Quan Z. Sheng,et al.  Sense and Focus: Towards Effective Location Inference and Event Detection on Twitter , 2015, WISE.

[18]  Gao Cong,et al.  Joint Recognition and Linking of Fine-Grained Locations from Tweets , 2016, WWW.

[19]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[20]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[21]  Joemon M. Jose,et al.  Building a large-scale corpus for evaluating event detection on twitter , 2013, CIKM.

[22]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[23]  Bo Sheng,et al.  Outlier detection in sensor networks , 2007, MobiHoc '07.

[24]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[25]  Quan Z. Sheng,et al.  Classifying Perspectives on Twitter: Immediate Observation, Affection, and Speculation , 2015, WISE.

[26]  Jie Yin,et al.  Location extraction from disaster-related microblogs , 2013, WWW.

[27]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[28]  Mohamed A. Sharaf,et al.  Emerging event detection in social networks with location sensitivity , 2014, World Wide Web.

[29]  Gerhard Weikum,et al.  People on drugs: credibility of user statements in health communities , 2014, KDD.

[30]  Ran Wolff,et al.  In-Network Outlier Detection in Wireless Sensor Networks , 2006, ICDCS.

[31]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[32]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[33]  MatsuoYutaka,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013 .

[34]  Michiaki Tatsubori,et al.  Location inference using microblog messages , 2012, WWW.

[35]  Yutaka Matsuo,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013, IEEE Transactions on Knowledge and Data Engineering.

[36]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[37]  John Carroll,et al.  Unsupervised Classification of Sentiment and Objectivity in Chinese Text , 2008, IJCNLP.

[38]  Kate Starbird,et al.  Rumors, False Flags, and Digital Vigilantes: Misinformation on Twitter after the 2013 Boston Marathon Bombing , 2014 .

[39]  Mohamed A. Sharaf,et al.  Predicting Elections from Social Networks Based on Sub-event Detection and Sentiment Analysis , 2014, WISE.