On the usefulness of query features for learning to rank

Learning to rank studies have mostly focused on query-dependent and query-independent document features, which enable the learning of ranking models of increased effectiveness. Modern learning to rank techniques based on regression trees can support query features, which are document-independent, and hence have the same values for all documents being ranked for a query. In doing so, such techniques are able to learn sub-trees that are specific to certain types of query. However, it is unclear which classes of features are useful for learning to rank, as previous studies leveraged anonymised features. In this work, we examine the usefulness of four classes of query features, based on topic classification, the history of the query in a query log, the predicted performance of the query, and the presence of concepts such as persons and organisations in the query. Through experiments on the ClueWeb09 collection, our results using a state-of-the-art learning to rank technique based on regression trees show that all four classes of query features can significantly improve upon an effective learned model that does not use any query feature.

[1]  Iadh Ounis,et al.  The Static Absorbing Model for the Web , 2005, J. Web Eng..

[2]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[3]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[4]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[5]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[6]  Craig MacDonald,et al.  University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming , 2005, CLEF.

[7]  Iadh Ounis,et al.  Incorporating term dependency in the dfr framework , 2007, SIGIR.

[8]  Mark Sanderson,et al.  Multiple approaches to analysing query diversity , 2009, SIGIR.

[9]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[10]  Cristina V. Lopes,et al.  Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.

[11]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[12]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[13]  Giorgio Gambosi,et al.  FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track , 2008, TREC.

[14]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[15]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[16]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[17]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[18]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[19]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[20]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[21]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[22]  Harry Shum,et al.  Query Dependent Ranking Using K-nearest Neighbor * , 2022 .

[23]  Eugene Agichtein,et al.  Query Ambiguity Revisited: Clickthrough Measures for Distinguishing Informational and Ambiguous Queries , 2010, NAACL.

[24]  Jianfeng Gao,et al.  Ranking, Boosting, and Model Adaptation , 2008 .