Ranking model selection and fusion for effective microblog search

Re-ranking was shown to have positive impact on the effectiveness for microblog search. Yet existing approaches mostly focused on using a single ranker to learn some better ranking function with respect to various relevance features. Given various available rank learners (such as learning to rank algorithms), in this work, we mainly study an orthogonal problem where multiple learned ranking models form an ensemble for re-ranking the retrieved tweets than just using a single ranking model in order to achieve higher search effectiveness. We explore the use of query-sensitive model selection and rank fusion methods based on the result lists produced from multiple rank learners. Base on the TREC microblog datasets, we found that our selection-based ensemble approach can significantly outperform using the single best ranker, and it also has clear advantage over the rank fusion that combines the results of all the available models.

[1]  Jinxi Xu,et al.  Solving the word mismatch problem through automatic text analysis , 1997 .

[2]  Tiejun Zhao,et al.  HIT at TREC 2012 Microblog Track , 2012, TREC.

[3]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[4]  Walid Magdy,et al.  QCRI at TREC 2013 Microblog Track , 2013, TREC.

[5]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[6]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[7]  Craig MacDonald,et al.  Learning to Select a Ranking Function , 2010, ECIR.

[8]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[9]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[10]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[13]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[14]  Wei Gao,et al.  Exploring Tweets Normalization and Query Time Sensitivity for Twitter Search , 2011, TREC.

[15]  Wei Gao,et al.  Microblog Search and Filtering with Time Sensitive Feedback and Thresholding bsed on BM25 , 2012, TREC.

[16]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[17]  Walid Magdy,et al.  Web-Based Pseudo Relevance Feedback for Microblog Retrieval , 2012, TREC.

[18]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[19]  Thomas Gottron,et al.  Searching microblogs: coping with sparsity and document quality , 2011, CIKM '11.

[20]  Fabio Crestani,et al.  Combination of similarity measures for effective spoken document retrieval , 2003, J. Inf. Sci..

[21]  Craig MacDonald,et al.  Overview of the TREC-2012 Microblog Track , 2012, Text Retrieval Conference.

[22]  Donald Metzler,et al.  USC/ISI at TREC 2011: Microblog Track , 2011, TREC.

[23]  Katrina Fenlon,et al.  Improving retrieval of short texts through document expansion , 2012, SIGIR '12.

[24]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[25]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[26]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[27]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[28]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[29]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.