Learning to Rank with Selection Bias in Personal Search

Click-through data has proven to be a critical resource for improving search ranking quality. Though a large amount of click data can be easily collected by search engines, various biases make it difficult to fully leverage this type of data. In the past, many click models have been proposed and successfully used to estimate the relevance for individual query-document pairs in the context of web search. These click models typically require a large quantity of clicks for each individual pair and this makes them difficult to apply in systems where click data is highly sparse due to personalized corpora and information needs, e.g., personal search. In this paper, we study the problem of how to leverage sparse click data in personal search and introduce a novel selection bias problem and address it in the learning-to-rank framework. This paper proposes a few bias estimation methods, including a novel query-dependent one that captures queries with similar results and can successfully deal with sparse data. We empirically demonstrate that learning-to-rank that accounts for query-dependent selection bias yields significant improvements in search effectiveness through online experiments with one of the world's largest personal search engines.

[1]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[2]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[3]  Ya Xu,et al.  Computers and iphones and mobile phones, oh my!: a logs-based comparison of search users on different devices , 2009, WWW '09.

[4]  Lihong Li,et al.  Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study , 2015, WWW.

[5]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[6]  Zheng Chen,et al.  A novel click model and its applications to online advertising , 2010, WSDM '10.

[7]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[8]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[9]  Chao Liu,et al.  Click chain model in web search , 2009, WWW '09.

[10]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[11]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[12]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[13]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[14]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[15]  Rong Ge,et al.  Evaluating online ad campaigns in a pipeline: causal models at scale , 2010, KDD.

[16]  Qiang Yang,et al.  A Whole Page Click Model to Better Interpret Search Engine Click Data , 2011, AAAI.

[17]  Emine Yilmaz,et al.  Document selection methodologies for efficient and effective learning-to-rank , 2009, SIGIR.

[18]  T. Minka Selection bias in the LETOR datasets , 2008 .

[19]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[20]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[21]  David Carmel,et al.  Rank by Time or by Relevance?: Revisiting Email Search , 2015, CIKM.

[22]  Ciya Liao,et al.  A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine , 2010, WSDM '10.

[23]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[24]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[25]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[26]  M. de Rijke,et al.  An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.

[27]  Mark T. Keane,et al.  Modeling Result-List Searching in the World Wide Web: The Role of Relevance Topologies and Trust Bias , 2006 .

[28]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[29]  Thorsten Joachims,et al.  Counterfactual Risk Minimization , 2015, ICML.

[30]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[31]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[32]  Susan T. Dumais,et al.  Stuff I've Seen: A System for Personal Information Retrieval and Re-Use , 2003, SIGF.

[33]  Yoelle Maarek,et al.  How Many Folders Do You Really Need?: Classifying Email into a Handful of Categories , 2014, CIKM.

[34]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[35]  Hongyuan Zha,et al.  A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[36]  Charles L. A. Clarke,et al.  Reliable information retrieval evaluation with incomplete and biased judgements , 2007, SIGIR.

[37]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[38]  Robert L. Mercer,et al.  An Estimate of an Upper Bound for the Entropy of English , 1992, CL.