An association rule dynamics and classification approach to event detection and tracking in Twitter

Twitter is a microblogging application used for sending and retrieving instant on-line messages of not more than 140 characters. There has been a surge in Twitter activities since its launch in 2006 as well as steady increase in event detection research on Twitter data (tweets) in recent years. With 284 million monthly active users Twitter has continued to grow both in size and activity. The network is rapidly changing the way global audience source for information and influence the process of journalism [Newman, 2009]. Twitter is now perceived as an information network in addition to being a social network. This explains why traditional news media follow activities on Twitter to enhance their news reports and news updates. Knowing the significance of the network as an information dissemination platform, news media subscribe to Twitter accounts where they post their news headlines and include the link to their on-line news where the full story may be found. Twitter users in some cases, post breaking news on the network before such news are published by traditional news media. This can be ascribed to Twitter subscribers' nearness to location of events. The use of Twitter as a network for information dissemination as well as for opinion expression by different entities is now common. This has also brought with it the issue of computational challenges of extracting newsworthy contents from Twitter noisy data. Considering the enormous volume of data Twitter generates, users append the hashtag (#) symbol as prefix to keywords in tweets. Hashtag labels describe the content of tweets. The use of hashtags also makes it easy to search for and read tweets of interest. The volume of Twitter streaming data makes it imperative to derive Topic Detection and Tracking methods to extract newsworthy topics from tweets. Since hashtags describe and enhance the readability of tweets, this research is developed to show how the appropriate use of hashtags keywords in tweets can demonstrate temporal evolvements of related topic in real-life and consequently enhance Topic Detection and Tracking on Twitter network. We chose to apply our method on Twitter network because of the restricted number of characters per message and for being a network that allows sharing data publicly. More importantly, our choice was based on the fact that hashtags are an inherent component of Twitter. To this end, the aim of this research is to develop, implement and validate a new approach that extracts newsworthy topics from tweets' hashtags of real-life topics over a specified period using Association Rule Mining. We termed our novel methodology Transaction-based Rule Change Mining (TRCM). TRCM is a system built on top of the Apriori method of Association Rule Mining to extract patterns of Association Rules changes in tweets hashtag keywords at different periods of time and to map the extracted keywords to related real-life topic or scenario. To the best of our knowledge, the adoption of dynamics of Association Rules of hashtag co-occurrences has not been explored as a Topic Detection and Tracking method on Twitter. The application of Apriori to hashtags present in tweets at two consecutive period t and t + 1 produces two association rulesets, which represents rules evolvement in the context of this research. A change in rules is discovered by matching every rule in ruleset at time t with those in ruleset at time t + 1. The changes are grouped under four identified rules namely 'New' rules, 'Unexpected Consequent' and 'Unexpected Conditional' rules, 'Emerging' rules and 'Dead' rules. The four rules represent different levels of topic real-life evolvements. For example, the emerging rule represents very important occurrence such as breaking news, while unexpected rules represents unexpected twist of event in an on-going topic. The new rule represents dissimilarity in rules in rulesets at time t and t+1. Finally, the dead rule represents topic that is no longer present on the Twitter network. TRCM revealed the dynamics of Association Rules present in tweets and demonstrates the linkage between the different types of rule dynamics to targeted real-life topics/events. In this research, we conducted experimental studies on tweets from different domains such as sports and politics to test the performance effectiveness of our method. We validated our method, TRCM with carefully chosen ground truth. The outcome of our research experiments include: Identification of 4 rule dynamics in tweets' hashtags namely: New rules, Emerging rules, Unexpected rules and 'Dead' rules using Association Rule Mining. These rules signify how news and events evolved in real-life scenario. Identification of rule evolvements on Twitter network using Rule Trend Analysis and Rule Trace. Detection and tracking of topic evolvements on Twitter using Transaction-based Rule Change Mining TRCM. Identification of how the peculiar features of each TRCM rules affect their performance effectiveness on real datasets.

[1]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[2]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[3]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[4]  Leysia Palen,et al.  Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency , 2011, ICWSM.

[5]  Mohamed Medhat Gaber,et al.  Extraction of Unexpected Rules from Twitter Hashtags and its Application to Sport Events , 2014, 2014 13th International Conference on Machine Learning and Applications.

[6]  Bing Liu Sentiment Analysis and Opinion Mining Opinion Mining , 2011 .

[7]  Ana-Maria Popescu,et al.  Detecting controversial events from twitter , 2010, CIKM.

[8]  Carlos J. Martín-Dancausa,et al.  Spot the Ball: Detecting Sports Events on Twitter , 2014, ECIR.

[9]  Leysia Palen,et al.  (How) will the revolution be retweeted?: information diffusion and the 2011 Egyptian uprising , 2012, CSCW.

[10]  Maximilian Walther,et al.  Geo-spatial Event Detection in the Twitter Stream , 2013, ECIR.

[11]  Craig MacDonald,et al.  Scalable distributed event detection for Twitter , 2013, 2013 IEEE International Conference on Big Data.

[12]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[13]  Hsia-Ching Chang,et al.  A new perspective on Twitter hashtag use: Diffusion of innovation theory , 2010, ASIST.

[14]  Julian Ausserhofer,et al.  NATIONAL POLITICS ON TWITTER , 2013 .

[15]  Jaishree Singh,et al.  Improving Efficiency of Apriori Algorithm Using Transaction Reduction , 2013 .

[16]  R. Balakrishnan,et al.  A textbook of graph theory , 1999 .

[17]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[18]  Houfeng Wang,et al.  Entity-centric topic-oriented opinion summarization in twitter , 2012, KDD.

[19]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[20]  Marc Cheong,et al.  Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base , 2009, CIKM-SWSM.

[21]  Zhoujun Li,et al.  Emerging topic detection for organizations from microblogs , 2013, SIGIR.

[22]  Padmini Srinivasan,et al.  What's trending?: mining topical trends in UGC systems with YouTube as a case study , 2011, MDMKDD '11.

[23]  Timothy Baldwin,et al.  On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online , 2012, COLING.

[24]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[25]  Takashi Washio,et al.  Using a Hash-Based Method for Apriori-Based Graph Mining , 2004, PKDD.

[26]  Gellof Kanselaar,et al.  Concrete and abstract visualizations in history learning tasks. , 2009, The British journal of educational psychology.

[27]  Craig MacDonald,et al.  Can Twitter Replace Newswire for Breaking News? , 2013, ICWSM.

[28]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[29]  Ram Mohana Reddy Guddeti,et al.  Influence factor based opinion mining of Twitter data using supervised learning , 2014, 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS).

[30]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[31]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[32]  Andreas M. Kaplan,et al.  The fairyland of Second Life: Virtual social worlds and how to use them , 2009 .

[33]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[34]  Mohamed Medhat Gaber,et al.  TRCM: A Methodology for Temporal Analysis of Evolving Concepts in Twitter , 2013, ICAISC.

[35]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[36]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[37]  Gustavo Rossi,et al.  An approach to discovering temporal association rules , 2000, SAC '00.

[38]  Brett Meyer,et al.  TwitterReporter: Breaking News Detection and Visualization through the Geo-Tagged Twitter Network , 2011, CATA.

[39]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[40]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[41]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[42]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[43]  Richard Colbaugh,et al.  Toward Emerging Topic Detection for Business Intelligence: Predictive Analysis of 'Meme' Dynamics , 2010, ArXiv.

[44]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[45]  Shrikanth S. Narayanan,et al.  A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle , 2012, ACL.

[46]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[47]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[48]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[49]  Lon Safko,et al.  The Social Media Bible: Tactics, Tools, and Strategies for Business Success , 2009 .

[50]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[51]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[52]  H. Raghav Rao,et al.  Retweeting the Fukushima nuclear radiation disaster , 2014, CACM.

[53]  René Peinl,et al.  Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j , 2013, EDBT '13.

[54]  Pak Chung Wong,et al.  Visualizing association rules for text mining , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[55]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[56]  N. Newman The rise of social media and its impact on mainstream journalism , 2009 .

[57]  Carmen Holotescu,et al.  CAN WE USE TWITTER FOR EDUCATIONAL ACTIVITIES , 2008 .

[58]  Jinyan Li,et al.  Eecient Mining of Emerging Patterns: Discovering Trends and Diierences , 1999 .

[59]  Hila Becker,et al.  Identifying content for planned events across social media sites , 2012, WSDM '12.

[60]  Jugal K. Kalita,et al.  Streaming trend detection in Twitter , 2013, Int. J. Web Based Communities.

[61]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[62]  Mohamed Medhat Gaber,et al.  Autonomic Discovery of News Evolvement in Twitter , 2015 .

[63]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[64]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[65]  Dominic L. Lasorsa,et al.  NORMALIZING TWITTER , 2012 .

[66]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[67]  Dave Evans,et al.  Social Media Marketing: The Next Generation of Business Engagement , 2010 .

[68]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[69]  Yasuhiko Morimoto,et al.  Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization , 1996, SIGMOD '96.

[70]  Mehmed Kantardzic,et al.  Data-Mining Concepts , 2011 .

[71]  Duen-Ren Liu,et al.  Mining the change of event trends for decision support in environmental scanning , 2009, Expert Syst. Appl..