Reusing historical interaction data for faster online learning to rank for IR

Online learning to rank for information retrieval (IR) holds promise for allowing the development of "self-learning" search engines that can automatically adjust to their users. With the large amount of e.g., click data that can be collected in web search settings, such techniques could enable highly scalable ranking optimization. However, feedback obtained from user interactions is noisy, and developing approaches that can learn from this feedback quickly and reliably is a major challenge. In this paper we investigate whether and how previously collected (historical) interaction data can be used to speed up learning in online learning to rank for IR. We devise the first two methods that can utilize historical data (1) to make feedback available during learning more reliable and (2) to preselect candidate ranking functions to be evaluated in interactions with users of the retrieval system. We evaluate both approaches on 9 learning to rank data sets and find that historical data can speed up learning, leading to substantially and significantly higher online performance. In particular, our pre-selection method proves highly effective at compensating for noise in user feedback. Our results show that historical data can be used to make online learning to rank for IR much more effective than previously possible, especially when feedback is noisy.

[1]  Katja Hofmann,et al.  Estimating interleaved comparison outcomes from historical click data , 2012, CIKM '12.

[2]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[3]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[4]  Richard S. Sutton,et al.  Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[5]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[6]  Emine Yilmaz,et al.  Semi-supervised learning to rank with preference regularization , 2011, CIKM '11.

[7]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[8]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[9]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[10]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[11]  Deepak Agarwal,et al.  Online Models for Content Optimization , 2008, NIPS.

[12]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[13]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[14]  Filip Radlinski,et al.  Comparing the sensitivity of information retrieval metrics , 2010, SIGIR.

[15]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[16]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[17]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[18]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[19]  John Langford,et al.  Exploration scavenging , 2008, ICML '08.

[20]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[21]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[22]  Ester Samuel-Cahn Combining unbiased estimators , 1994 .

[23]  Yiqun Liu,et al.  Incorporating revisiting behaviors into click models , 2012, WSDM '12.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Chris Mesterharm,et al.  Experience-efficient learning in associative bandit problems , 2006, ICML.

[26]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[27]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[28]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[29]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[30]  Dale Schuurmans,et al.  Greedy Importance Sampling , 1999, NIPS.

[31]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.