Real-time Detection and Sorting of News on Microblogging Platforms

Due to the increasing popularity of microblogging platforms (e.g., Twitter), detecting realtime news from microblogs (e.g., tweets) has recently drawn a lot of attention. Most of the previous work on this subject detect news by analyzing propagation patterns of microblogs. This approach has two limitations: (i) many non-news microblogs (e.g. marketing activities) have propagation patterns similar to news microblogs and therefore they can be falsely reported as news; (ii) using propagation patterns to identify news involves a time delay until the pattern is formed, therefore news are not detected in real time. We propose an alternative approach, which, motivated by the necessity of real-time detection of news, does not rely on propagation of posts. Moreover, we propose a real-time sorting strategy that orders the detected news microblogs using a translational approach. An experimental evaluation on a large-scale microblogging dataset demonstrates the effectiveness of our approach.

[1]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[2]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[3]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[4]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[5]  W. Bruce Croft,et al.  User oriented tweet ranking: a filtering approach to microblogs , 2011, CIKM '11.

[6]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[7]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[8]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[9]  Fariborz Mahmoudi,et al.  From Text to Knowledge: Semantic Entity Extractionusing YAGO Ontology , 2011 .

[10]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[11]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[12]  Rizal Setya Perdana What is Twitter , 2013 .

[13]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[14]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[15]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[16]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[17]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.