Graph Clustering-Based Emerging Event Detection from Twitter Data Stream

Event detection from online social media is nowadays important to many fields, such as crisis notification, health epidemic identification, and trending topic extraction. To deal with the problem, in this paper we propose a new methodology to capture emerging events from Twitter data stream. We define a tweet graph representing tweet term vectors as vertices associated by their content similarities. Based on the assumption that an event denotes a set of similar tweets, we therefore employ the Markov clustering algorithm on the tweet graph to group related tweets. Then, the connected of similar events between consecutive time intervals are classified as an event trend line. Finally, the first one of those connected events will be considered as the emerging event. Performance evaluation of the proposed approach has been done on thirty days of extracted Twitter data stream. The results of detected emerging events have been studied and evaluated by fifteen volunteers with 70-80% precision.

[1]  T. Murata,et al.  Breaking News Detection and Tracking in Twitter , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[2]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[3]  Junghoo Cho,et al.  Topical semantics of twitter links , 2011, WSDM '11.

[4]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[5]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[6]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[9]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[10]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[11]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[12]  Christopher Cheong,et al.  Social Media Data Mining: A Social Network Analysis Of Tweets During The 2010-2011 Australian Floods , 2011, PACIS.

[13]  Bernardo A. Huberman,et al.  Trends in Social Media: Persistence and Decay , 2011, ICWSM.

[14]  References , 1971 .

[15]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[16]  Amina Madani,et al.  What’s Happening: A Survey of Tweets Event Detection , 2014, ICC 2014.

[17]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[18]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[19]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[20]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[21]  Jugal K. Kalita,et al.  Streaming trend detection in Twitter , 2013, Int. J. Web Based Communities.

[22]  Max L. Wilson,et al.  Searching Twitter: Separating the Tweet from the Chaff , 2011, ICWSM.

[23]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[24]  Arkaitz Zubiaga,et al.  Real‐time classification of Twitter trends , 2014, J. Assoc. Inf. Sci. Technol..