Combining temporal and content aware features for microblog retrieval

Microblog, especially Twitter, have become an integral part of our daily life for searching latest news and events information. Due to short length characteristics of tweets, only content-relevance based search result cannot satisfy user's information need. Recent research shows that considering temporal aspects in this regard improve the retrieval performance significantly. In this paper, we propose a method to re-rank the search result based on temporal features, account related features, and Twitter specific features along with textual features of tweets. We also applied a two stage query expansion technique to improve the relevancy of tweets. After automatic feature selection by using LASSO and elastic-net regularization; we applied random forest as a feature ranking method to estimate the importance of selected feature. Then, with that importance score, a weighted ranking model combines the features value to estimate the relevance score. We conducted our experiments based on the TREC Microblog 2011 and 2012 queries over the TREC Tweets2011 collection. Experimental result demonstrates the effectiveness of our method over the baseline in terms of precision@30 (P@30), mean average precision (MAP), and reciprocal-precision (R-Prec) metrics.

[1]  Rodrygo L. T. Santos Explicit web search result diversification , 2013, SIGF.

[2]  Jimmy J. Lin,et al.  Overview of the TREC-2014 Microblog Track , 2014, TREC.

[3]  Jimmy J. Lin,et al.  Overview of the TREC-2014 Microblog Track ( Notebook Draft ) , 2014 .

[4]  Alvaro Barreiro,et al.  A Study of Smoothing Methods for Relevance-Based Language Modelling of Recommender Systems , 2015, ECIR.

[5]  Clement T. Yu,et al.  The Impacts of Structural Difference and Temporality of Tweets on Retrieval Effectiveness , 2013, TOIS.

[6]  W. Bruce Croft,et al.  Quality models for microblog retrieval , 2012, CIKM.

[7]  M. de Rijke,et al.  Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[8]  Fei Liu,et al.  A Broad-Coverage Normalization System for Social Media Language , 2012, ACL.

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Joemon M. Jose,et al.  University of Glasgow (UoG_TwTeam) at TREC Microblog 2013 , 2013, TREC.

[11]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[12]  Stephen E. Robertson,et al.  Probabilistic models in IR and their relationships , 2014, Information Retrieval.

[13]  Jimmy J. Lin,et al.  Temporal feedback for tweet search with non-parametric density estimation , 2014, SIGIR.

[14]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[15]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[16]  Kazuhiro Seki,et al.  Improving pseudo-relevance feedback via tweet selection , 2013, CIKM.

[17]  Walid Magdy,et al.  QCRI at TREC 2013 Microblog Track , 2013, TREC.

[18]  Giorgio Gambosi,et al.  On relevance, time and query expansion , 2011, CIKM '11.

[19]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[20]  Jimmy J. Lin,et al.  Overview of the TREC-2013 Microblog Track , 2013, TREC.

[21]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[22]  Timothy Baldwin,et al.  Automatically Constructing a Normalisation Dictionary for Microblogs , 2012, EMNLP.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.