Contextual Bandits for Information Retrieval

In this paper we give an overview of and outlook on research at the intersection of information retrieval (IR) and contextual bandit problems. A critical problem in information retrieval is online learning to rank, where a search engine strives to improve the quality of the ranked result lists it presents to users on the basis of those users’ interactions with those result lists. Recently, researchers have started to model interactions between users and search engines as contextual bandit problems, and initial methods for learning in this setting have been devised. Our research focuses on two aspects: balancing exploration and exploitation and inferring preferences from implicit user interactions. This paper summarizes our recent work on online learning to rank for information retrieval and points out challenges that are characteristic of this application area.

[1]  Shimon Whiteson,et al.  On-line evolutionary computation for reinforcement learning in stochastic domains , 2006, GECCO.

[2]  Christos Faloutsos,et al.  Tailoring click models to user goals , 2009, WSCD '09.

[3]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[4]  Richard S. Sutton,et al.  Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.

[5]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[6]  Wei Chu,et al.  Online learning for recency search ranking using real-time user feedback , 2010, CIKM '10.

[7]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[8]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[9]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[10]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[11]  Thorsten Joachims,et al.  Fast Active Exploration for Link-Based Preference Learning Using Gaussian Processes , 2010, ECML/PKDD.

[12]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[13]  Filip Radlinski,et al.  Comparing the sensitivity of information retrieval metrics , 2010, SIGIR.

[14]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[17]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[18]  John Langford,et al.  Exploration scavenging , 2008, ICML '08.

[19]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[20]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[21]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[22]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[23]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.