Event Detection in Twitter Microblogging

The millions of tweets submitted daily overwhelm users who find it difficult to identify content of interest revealing the need for event detection algorithms in Twitter. Such algorithms are proposed in this paper covering both short (identifying what is currently happening) and long term periods (reviewing the most salient recently submitted events). For both scenarios, we propose fuzzy represented and timely evolved tweet-based theoretic information metrics to model Twitter dynamics. The Riemannian distance is also exploited with respect to words' signatures to minimize temporal effects due to submission delays. Events are detected through a multiassignment graph partitioning algorithm that: 1) optimally retains maximum coherence within a cluster and 2) while allowing a word to belong to several clusters (events). Experimental results on real-life data demonstrate that our approach outperforms other methods.

[1]  Nick E. Green,et al.  Detecting Life Events in Feeds from Twitter , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[2]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[4]  Pietro Perona,et al.  Grouping and dimensionality reduction by locally linear embedding , 2001, NIPS.

[5]  Jitendra Malik,et al.  Normalized Cut and Image Segmentation , 1997 .

[6]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[7]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[8]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[9]  Stefanos D. Kollias,et al.  A fuzzy video content representation for video summarization and content-based retrieval , 2000, Signal Process..

[10]  Scott Sanner,et al.  Improving LDA topic models for microblogs via tweet pooling and automatic labeling , 2013, SIGIR.

[11]  Sergej Sizov,et al.  GeoFolk: latent spatial semantics in web 2.0 social media , 2010, WSDM '10.

[12]  W. Förstner,et al.  A Metric for Covariance Matrices , 2003 .

[13]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[14]  Aixin Sun,et al.  Query-Guided Event Detection From News and Blog Streams , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[15]  Krishna P. Gummadi,et al.  The World of Connections and Information Flow in Twitter , 2012, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[16]  Mohand Boughanem,et al.  Uprising microblogs: a bayesian network retrieval model for tweet search , 2012, SAC '12.

[17]  Younghoon Kim,et al.  TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation , 2014, Inf. Syst..

[18]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[19]  Abdelghani Bellaachia,et al.  NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[20]  Wenjie Li,et al.  Sequential Summarization: A Full View of Twitter Trending Topics , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Ee-Peng Lim,et al.  Analyzing feature trajectories for event detection , 2007, SIGIR.

[24]  Nikolaos D. Doulamis,et al.  Evaluation of relevance feedback schemes in content-based in retrieval systems , 2006, Signal Process. Image Commun..

[25]  Yang Li,et al.  Interpreting the Public Sentiment Variations on Twitter , 2014, IEEE Transactions on Knowledge and Data Engineering.

[26]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[27]  Zhi-Hua Zhou,et al.  Distributional Features for Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[28]  Akiko Aizawa The feature quantity: an information theoretic perspective of Tfidf-like measures , 2000, SIGIR '00.

[29]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[30]  Emmanouel A. Varvarigos,et al.  Resource Selection for Tasks with Time Requirements Using Spectral Clustering , 2014, IEEE Transactions on Computers.

[31]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[32]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[33]  T. Murata,et al.  Breaking News Detection and Tracking in Twitter , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[34]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[35]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[36]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[37]  Ioannis Katakis,et al.  Social Voting Advice Applications—Definitions, Challenges, Datasets and Evaluation , 2014, IEEE Transactions on Cybernetics.

[38]  Martine De Cock,et al.  Ranking Approaches for Microblog Search , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[39]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[40]  Leonid Novak,et al.  Sifting micro-blogging stream for events of user interest , 2009, SIGIR.

[41]  Luis Alfonso Ureña López,et al.  Ranked WordNet graph for Sentiment Polarity Classification in Twitter , 2014, Comput. Speech Lang..

[42]  Tetsuji Satoh,et al.  Two Phase Extraction Method for Extracting Real Life Tweets Using LDA , 2013, APWeb.

[43]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[44]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[45]  Nikolaos D. Doulamis,et al.  Performance evaluation of Euclidean/correlation-based relevance feedback algorithms in content-based image retrieval systems , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[46]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[47]  Abdelghani Bellaachia,et al.  Learning from Twitter Hashtags: Leveraging Proximate Tags to Enhance Graph-Based Keyphrase Extraction , 2012, 2012 IEEE International Conference on Green Computing and Communications.

[48]  Leandro Nunes de Castro,et al.  A keyword extraction method from twitter messages represented as graphs , 2014, Appl. Math. Comput..

[49]  Hae-Chang Rim,et al.  Identifying interesting Twitter contents using topical analysis , 2014, Expert Syst. Appl..

[50]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[51]  K. Fan,et al.  Maximum Properties and Inequalities for the Eigenvalues of Completely Continuous Operators. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.