Generating Pseudo Test Collections for Learning to Rank Scientific Articles

Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annotations and the associated documents as a source for pairs of queries and relevant documents. We investigate how learning to rank performance varies when we use different methods for sampling annotations, and show how our pseudo test collection ranks systems compared to editorial topics with editorial judgements. Our results demonstrate that it is possible to train a learning to rank algorithm on generated pseudo judgments. In some cases, performance is on par with learning on manually obtained ground truth.

[1]  M. de Rijke,et al.  Parsimonious relevance models , 2008, SIGIR '08.

[2]  M. de Rijke,et al.  Building simulated queries for known-item topics: an analysis using six european languages , 2007, SIGIR.

[3]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[4]  Michael Kluck,et al.  Domain-Specific Track CLEF 2006: Overview of Results and Approaches, Remarks on the Assessment Analysis , 2006, CLEF.

[5]  D. Sculley,et al.  Combined regression and ranking , 2010, KDD.

[6]  Maarten de Rijke,et al.  The University of Amsterdam at the CLEF 2008 Domain Specific Track: Parsimonious Relevance and Concept Models , 2008, CLEF.

[7]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[8]  Vivien Petras,et al.  The Domain -Specific Track at CLEF 2007 , 2007, CLEF.

[9]  Fredric C. Gey,et al.  Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers , 2006, CLEF.

[10]  Jimmy J. Lin,et al.  Pseudo test collections for learning web search ranking functions , 2011, SIGIR.

[11]  Vivien Petras,et al.  The Domain-Specific Track at CLEF 2008 , 2008, CLEF.

[12]  Jean Tague-Sutcliffe,et al.  Problems in the simulation of bibliographic retrieval systems , 1980, SIGIR '80.

[13]  M. de Rijke,et al.  Simulating searches from transaction logs , 2010 .

[14]  W. Bruce Croft,et al.  Retrieval experiments using pseudo-desktop collections , 2009, CIKM.

[15]  Vivien Petras How One Word Can Make all the Difference - Using Subject Metadata for Automatic Query Expansion and Reformulation , 2005, CLEF.

[16]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[17]  Jean Tague-Sutcliffe,et al.  Simulation of User Judgments in Bibliographic Retrieval Systems , 1981, SIGIR.

[18]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[19]  W. Bruce Croft,et al.  Quantifying query ambiguity , 2002 .

[20]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[21]  Fredric C. Gey,et al.  The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval , 2000, CLEF.

[22]  J. Kleinberg,et al.  Networks, Crowds, and Markets , 2010 .

[23]  Abdur Chowdhury,et al.  Using titles and category names from editor-driven taxonomies for automatic evaluation , 2003, CIKM '03.

[24]  Carol Peters,et al.  Cross-Language Information Retrieval and Evaluation , 2001, Lecture Notes in Computer Science.

[25]  Katja Hofmann,et al.  Validating Query Simulators: An Experiment Using Commercial Searches and Purchases , 2010, CLEF.

[26]  Alan F. Smeaton,et al.  Multilingual and Multimodal Information Access Evaluation, International Conference of the Cross-Language Evaluation Forum, CLEF 2010, Padua, Italy, September 20-23, 2010. Proceedings , 2010, CLEF.