A combined classification-clustering framework for identifying disruptive events

Twitter is a popular micro-blogging web application serving hundreds of millions of users. Users publish short messages to communicate with friends and families, express their opinions and broadcast news and information about a variety of topics all in real-time. User-generated content can be utilized as a rich source of real-world event identification as well as extract useful knowledge about disruptive events for a given region. In this paper, we propose a novel detection framework for identifying real-time events, including a main event and associated disruptive events, from Twitter data. Theapproach is based on five steps:data collection, pre-processing,classification, online clustering and summarization. We use a Naive Bayes classification model and an Online Clustering method to validate our model on a major real-world event (Formula 1 Abu Dhabi Grand Prix 2013).

[1]  Philip S. Yu,et al.  Adding the temporal dimension to search - a case study in publication search , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[2]  A. Bruns,et al.  #qldfloods and @QPSMedia: Crisis Communication on Twitter in the 2011 South East Queensland Floods , 2012 .

[3]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[4]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[5]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[6]  Vwani P. Roychowdhury,et al.  Information resonance on Twitter: watching Iran , 2010, SOMA '10.

[7]  Luo Si,et al.  Knowledge Transfer and Opinion Detection in the TREC2006 Blog Track , 2006 .

[8]  Maximilian Walther,et al.  Geo-spatial Event Detection in the Twitter Stream , 2013, ECIR.

[9]  Craig MacDonald,et al.  Identifying local events by using microblogs as social sensors , 2013, OAIR.

[10]  Raphaël Troncy,et al.  Using social media to identify events , 2011, WSM '11.

[11]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[12]  A. Valle,et al.  Diffusion of nuclear energy in some developing countries , 2014 .

[13]  William Ribarsky,et al.  LeadLine: Interactive visual analysis of text data through event identification and exploration , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[14]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[15]  Grace Hui Yang,et al.  Knowledge Transfer and Opinion Detection in the TREC 2006 Blog Track , 2006, TREC.

[16]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[17]  Huan Liu,et al.  Twitter Data Analytics , 2013, SpringerBriefs in Computer Science.

[18]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[19]  Axel Bruns,et al.  HOW LONG IS A TWEET? MAPPING DYNAMIC CONVERSATION NETWORKS ON TWITTER USING GAWK AND GEPHI , 2012 .

[20]  Xiaofeng Wang,et al.  Automatic Crime Prediction Using Events Extracted from Twitter Posts , 2012, SBP.

[21]  M. Osborne,et al.  Using Prediction Markets and Twitter to Predict a Swine Flu Pandemic , 2009 .

[22]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[23]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[24]  Juan Martínez-Romo,et al.  Detecting malicious tweets in trending topics using a statistical analysis of language , 2013, Expert Syst. Appl..

[25]  Tetsuro Takahashi,et al.  Rumor detection on twitter , 2012, The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems.

[27]  Claire Cardie,et al.  Early Stage Influenza Detection from Twitter , 2013, ArXiv.

[28]  Huan Liu,et al.  Mining Social Media: A Brief Introduction , 2012 .

[29]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[30]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[31]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[32]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[33]  Edward A. Fox,et al.  Social media use by government: from the routine to the critical , 2011, dg.o '11.

[34]  Freddy Chong Tat Chua,et al.  Automatic Summarization of Events from Social Media , 2013, ICWSM.

[35]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[36]  Adam Michael Edwards,et al.  Detecting tension in online communities with computational Twitter analysis , 2015 .