TwitterNews: Real time event detection from the Twitter data stream

Research in event detection from the Twitter stream- ing data has been gaining momentum in the last couple of years. Although such data is noisy and often contains mislead- ing information, Twitter can be a rich source of information if harnessed properly. In this paper, we propose a scalable event detection system, TwitterNews, to detect and track newsworthy events in real time from Twitter. TwitterNews provides a novel approach, by combining random indexing based term vector model with locality sensitive hashing, that aids in performing incremental clustering of tweets related to various events within a fixed time. TwitterNews also incorporates an effective strategy to deal with the cluster fragmentation issue prevalent in in- cremental clustering. The set of candidate events generated by TwitterNews are then filtered, to report the newsworthy events along with an automatically selected representative tweet from each event cluster. Finally, we evaluate the effectiveness of TwitterNews, in terms of the recall and the precision, using a publicly available corpus.

[1]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[2]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[3]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[4]  Keith Stevens,et al.  The S-Space Package: An Open Source Package for Word Space Models , 2010, ACL.

[5]  Gerhard Weikum,et al.  EnBlogue: emergent topic detection in web 2.0 streams , 2011, SIGMOD '11.

[6]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[7]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[8]  Fei Wang,et al.  What Were the Tweets About? Topical Associations between Public Events and Twitter Feeds , 2012, ICWSM.

[9]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[10]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[11]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[12]  Michelle X. Zhou,et al.  Event detection with social media data , 2012 .

[13]  Timothy Baldwin,et al.  On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online , 2012, COLING.

[14]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[15]  Kamalakar Karlapalem,et al.  ET: events from tweets , 2013, WWW.

[16]  Joemon M. Jose,et al.  Building a large-scale corpus for evaluating event detection on twitter , 2013, CIKM.

[17]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[18]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[19]  L. Venkata Subramaniam,et al.  From Tweets to Events: Exploring a Scalable Solution for Twitter Streams , 2014, ArXiv.

[20]  Chen Lin,et al.  CLEar: A Real-time Online Observatory for Bursty and Viral Events , 2014, Proc. VLDB Endow..

[21]  Paola Velardi,et al.  Efficient temporal mining of micro-blog texts and its application to event discovery , 2015, Data Mining and Knowledge Discovery.

[22]  Joemon M. Jose,et al.  Real-Time Entity-Based Event Detection for Twitter , 2015, CLEF.

[23]  Liangyu Chen,et al.  An Unsupervised Framework of Exploring Events on Twitter: Filtering, Extraction and Categorization , 2015, AAAI.

[24]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..