Click-based Hot Fixes for Underperforming Torso Queries

Ranking documents using their historical click-through rate (CTR) can improve relevance for frequently occurring queries, i.e., so-called head queries. It is difficult to use such click signals on non-head queries as they receive fewer clicks. In this paper, we address the challenge of dealing with torso queries on which the production ranker is performing poorly. Torso queries are queries that occur frequently enough so that they are not considered as tail queries and yet not frequently enough to be head queries either. They comprise a large portion of most commercial search engines' traffic, so the presence of a large number of underperforming torso queries can harm the overall performance significantly. We propose a practical method for dealing with such cases, drawing inspiration from the literature on learning to rank (LTR). Our method requires relatively few clicks from users to derive a strong re-ranking signal by comparing document relevance between pairs of documents instead of using absolute numbers of clicks per document. By infusing a modest amount of exploration into the ranked lists produced by a production ranker and extracting preferences between documents, we obtain substantial improvements over the production ranker in terms of page-level online metrics. We use an exploration dataset consisting of real user clicks from a large-scale commercial search engine to demonstrate the effectiveness of the method. We conduct further experimentation on public benchmark data using simulated clicks to gain insight into the inner workings of the proposed method. Our results indicate a need for LTR methods that make more explicit use of the query and other contextual information.

[1]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[2]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[3]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[4]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[5]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[6]  Quoc V. Le,et al.  Learning to Rank with Non-Smooth Cost Functions , 2007 .

[7]  Harry Shum,et al.  Query Dependent Ranking Using K-nearest Neighbor * , 2022 .

[8]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[9]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[10]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[11]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[12]  Pinar Donmez,et al.  On the local optimality of LambdaRank , 2009, SIGIR.

[13]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[14]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[15]  Gilad Mishne,et al.  Organizing query completions for web search , 2010, CIKM '10.

[16]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[17]  Fan Li,et al.  Ranking specialization for web search: a divide-and-conquer approach by using topical RankSVM , 2010, WWW '10.

[18]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[19]  Jaime Teevan,et al.  Understanding and predicting personal navigation , 2011, WSDM '11.

[20]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[21]  Timos K. Sellis,et al.  Learning to rank user intent , 2011, CIKM '11.

[22]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[23]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[24]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[25]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[26]  Wei Chu,et al.  Refining Recency Search Results with User Click Feedback , 2011, ArXiv.

[27]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2012, J. Mach. Learn. Res..

[28]  ChengXiang Zhai,et al.  A learning approach to optimizing exploration–exploitation tradeoff in relevance feedback , 2012, Information Retrieval.

[29]  Ryen W. White,et al.  Toward self-correcting search engines: using underperforming queries to improve search , 2013, SIGIR.

[30]  Katja Hofmann,et al.  Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.

[31]  Ryen W. White,et al.  Playing by the rules: mining query associations to predict search performance , 2013, WSDM.

[32]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[33]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[34]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[35]  M. de Rijke,et al.  Online Exploration for Detecting Shifts in Fresh Intent , 2014, CIKM.

[36]  Zitao Liu,et al.  A Large Scale Query Logs Analysis for Assessing Personalization Opportunities in E-commerce Sites , 2014 .

[37]  M. de Rijke,et al.  Relative confidence sampling for efficient on-line ranker evaluation , 2014, WSDM.

[38]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[39]  Lihong Li,et al.  Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study , 2015, WWW.

[40]  M. de Rijke,et al.  Bayesian Ranker Comparison Based on Historical User Interactions , 2015, SIGIR.

[41]  M. de Rijke,et al.  MergeRUCB: A Method for Large-Scale Online Ranker Evaluation , 2015, WSDM.

[42]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[43]  Milad Shokouhi,et al.  Implicit Preference Labels for Learning Highly Selective Personalized Rankers , 2015, ICTIR.

[44]  Filip Radlinski,et al.  Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..