论文信息 - Evaluating Learning-to-Rank Methods in the Web Track Adhoc Task

Evaluating Learning-to-Rank Methods in the Web Track Adhoc Task

Learning-to-rank methods are becoming ubiquitous in information retrieval. Their advantage lies in the ability to combine a large number of low-impact relevance signals. This requires large training and test data sets. A large test data set is also needed to verify the usefulness of specific relevance signals (using statistical methods). There are several publicly available data collections geared towards evaluation of learning-to-rank methods. These collections are large, but they typically provide a fixed set of precomputed (and often anonymized) relevance signals. In turn, computing new signals may be impossible. This limitation motivated us to experiment with learning-to-rank methods using the TREC Web adhoc collection. Specifically, we compared performance of learning-to-rank methods with performance of a hand-tuned formula (based on the same set of relevance signals). Even though the TREC data set did not have enough queries to draw definitive conclusions, the hand-tuned formula seemed to be at par with learning-to-rank methods.

Leonid Boytsov | Anna Belova

[1] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.

[2] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3] Ben Carterette,et al. Million Query Track 2007 Overview , 2008, TREC.

[4] Mark Sanderson,et al. Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[5] Rajesh Shenoy,et al. On the robustness of relevance measures with incomplete judgments , 2007, SIGIR.

[6] Jimmy J. Lin,et al. UMD and USC/ISI: TREC 2010 Web Track Experiments with Ivory , 2010, TREC.

[7] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[8] David Hawking,et al. Predicting Fame and Fortune: PageRank or Indegree? , 2003 .

[9] Kilian Q. Weinberger,et al. Web-Search Ranking with Initialized Gradient Boosted Regression Trees , 2010, Yahoo! Learning to Rank Challenge.

[10] Leonid Boytsov,et al. Lessons Learned From Indexing Close Word Pairs , 2010, TREC.

[11] Tao Tao,et al. An exploration of proximity measures in information retrieval , 2007, SIGIR.

[12] James Allan,et al. A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[13] Ronan Cummins,et al. Learning in a pairwise term-term proximity framework for information retrieval , 2009, SIGIR.

[14] W. Rice. ANALYZING TABLES OF STATISTICAL TESTS , 1989, Evolution; international journal of organic evolution.

[15] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16] Marc Najork,et al. Microsoft Research at TREC 2011 Web Track , 2010, TREC.

[17] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[18] Yi Chang,et al. Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[19] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[20] Ben He,et al. Modeling term proximity for probabilistic information retrieval models , 2011, Inf. Sci..

[21] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.

[22] Tie-Yan Liu,et al. Future directions in learning to rank , 2010, Yahoo! Learning to Rank Challenge.

[23] Charles L. A. Clarke,et al. Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[24] Olivier Chapelle,et al. Expected reciprocal rank for graded relevance , 2009, CIKM.

[25] Stephen E. Robertson,et al. Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.