Balancing Exploration and Exploitation in Learning to Rank Online

As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running. In such an online setting, algorithms need to both explore new solutions to obtain feedback for effective learning, and exploit what has already been learned to produce results that are acceptable to users. We formulate this challenge as an exploration-exploitation dilemma and present the first online learning to rank algorithm that works with implicit feedback and balances exploration and exploitation. We leverage existing learning to rank data sets and recently developed click models to evaluate the proposed algorithm. Our results show that finding a balance between exploration and exploitation can substantially improve online retrieval performance, bringing us one step closer to making online learning to rank work in practice.

[1]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[2]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[3]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[4]  Richard S. Sutton,et al.  Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[5]  ChengXiang Zhai,et al.  Evaluation of methods for relative comparison of retrieval systems based on clickthroughs , 2009, CIKM.

[6]  Christos Faloutsos,et al.  Tailoring click models to user goals , 2009, WSCD '09.

[7]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[8]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[9]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[10]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[11]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[12]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[13]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[14]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[15]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[16]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[17]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[18]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[19]  Filip Radlinski,et al.  Comparing the sensitivity of information retrieval metrics , 2010, SIGIR.

[20]  Thorsten Joachims,et al.  Fast Active Exploration for Link-Based Preference Learning Using Gaussian Processes , 2010, ECML/PKDD.

[21]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[22]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[23]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[24]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[25]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[26]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[27]  Jaime G. Carbonell,et al.  Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve , 2009, ECIR.

[28]  Zhongsheng Hua,et al.  Reducing the Probability of Bankruptcy Through Supply Chain Coordination , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[30]  John Langford,et al.  Exploration scavenging , 2008, ICML '08.