On Improving a Microblog Ranking

Microblog ranking is a hot research topic in recent years. Most of the related works apply TF-IDF metric for calculating content similarity while neglecting their semantic similarity. And most existing search engines which retrieve the microblog list by string matching the search keywords is not competent to provide a reliable list for users when dealing with polysemy and synonym. Besides, treating all the users with same authority for all topics is intuitively not ideal. In this paper, a comprehensive strategy for microblog ranking is proposed. First, we extend the conventional TF-IDF based content similarity with exploiting knowledge from WordNet. Then, we further incorporate a new feature for microblog ranking that is the topical relation between search keyword and its retrieval. Author topical authority is also incorporated into the ranking framework as an important feature for microblog ranking. Gradient Boosting Decision Tree(GBDT), then is employed to train the ranking model with multiple features involved. We conduct thorough experiments on a large-scale real-world Twitter dataset and demonstrate that our proposed approach outperform a number of existing approaches in discovering higher quality and more related microblogs.

[1]  Anthony Stefanidis,et al.  Linking cyber and physical spaces through community detection and clustering in social media feeds , 2015, Comput. Environ. Urban Syst..

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Zaiqing Nie,et al.  Proceedings of the Ninth International Workshop on Information Integration on the Web , 2012 .

[6]  Subbarao Kambhampati,et al.  Ranking tweets considering trust and relevance , 2012, IIWeb '12.

[7]  M. de Rijke,et al.  A syntax-aware re-ranker for microblog retrieval , 2014, SIGIR.

[8]  Chris Hankin,et al.  The early bird catches the term: combining twitter and news data for event detection and situational awareness , 2015, Journal of Biomedical Semantics.

[9]  Yuanzhi Li,et al.  A Theoretical Analysis of NDCG Ranking Measures , 2013 .

[10]  Martine De Cock,et al.  Ranking Approaches for Microblog Search , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[11]  Li Yang,et al.  The Research of Weighted Community Partition based on SimHash , 2013 .

[12]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[13]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[14]  Xuemin Shen,et al.  Cross-Layer Performance Study of Cooperative Diversity System With ARQ , 2009, IEEE Transactions on Vehicular Technology.

[15]  Jialiang Chen,et al.  A Novel Topical Authority-Based Microblog Ranking , 2014, APWeb.

[16]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[17]  Xueqi Cheng,et al.  Ranking Tweets with Local and Global Consistency Using Rich Features , 2014, PAKDD.

[18]  Tie-Yan Liu,et al.  A Theoretical Analysis of NDCG Type Ranking Measures , 2013, COLT.

[19]  J. Friedman Stochastic gradient boosting , 2002 .

[20]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[21]  Giuseppe Pirrò,et al.  A semantic similarity metric combining features and intrinsic information content , 2009, Data Knowl. Eng..

[22]  Yi Yang,et al.  Quality-biased Ranking of Short Texts in Microblogging Services , 2011, IJCNLP.