Finding non-redundant multi-word events on Twitter

Twitter is a pervasive technology, with hundreds of millions of users serving as sensors that provide eyewitness accounts of events on the ground. In case of popular events, these sensors start to broadcast news by tweeting to their followers, and to the world. Within minutes these tweets can attract attention and also serve as a primary information source for traditional media. Given a huge set of tweets, the key questions are: (1) How can we detect informative events in general? (2) How can we distinguish relevant events from others? In this paper we tackle these challenges with a statistical model for detecting events by spotting significant frequency deviations of the words' frequency over time. Besides single word events, our model also accounts for events composed of multiple co-occurring words, thus, providing much richer information. Our statistical process is complemented with an optimization algorithm to extract only non-redundant events, overall, providing the user with a succinct summary of the current events. We used our model to analyze 24 million geotagged tweets that have been sent in the US from April 9 to April 22, 2013 - the time period of the Boston marathon bombing - and we show that our approach can create multi-word events that efficiently summarize real-world events.

[1]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[2]  J Allan,et al.  Readings in information retrieval. , 1998 .

[3]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[4]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5]  Sophia Rabe-Hesketh,et al.  A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models , 2013, Psychometrika.

[6]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[7]  Mor Naaman,et al.  Is it really about me?: message content in social awareness streams , 2010, CSCW '10.

[8]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Luigi Di Caro,et al.  Personalized emerging topic detection based on a term aging model , 2013, ACM Trans. Intell. Syst. Technol..

[11]  Jugal K. Kalita,et al.  Streaming trend detection in Twitter , 2013, Int. J. Web Based Communities.

[12]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[13]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[14]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.