Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial

During the past 10--15 years offline learning to rank has had a tremendous influence on information retrieval, both scientifically and in practice. Recently, as the limitations of offline learning to rank for information retrieval have become apparent, there is increased attention for online learning to rank methods for information retrieval in the community. Such methods learn from user interactions rather than from a set of labeled data that is fully available for training up front. Below we describe why we believe that the time is right for an intermediate-level tutorial on online learning to rank, the objectives of the proposed tutorial, its relevance, as well as more practical details, such as format, schedule and support materials.

[1]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[2]  de RijkeMaarten,et al.  "Learning to rank for information retrieval from user interactions" by K. Hofmann, S. Whiteson, A. Schuth, and M. de Rijke with Martin Vesely as coordinator , 2014 .

[3]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Ryen W. White,et al.  Modeling dwell time to predict click-level satisfaction , 2014, WSDM.

[6]  M. de Rijke,et al.  MergeRUCB: A Method for Large-Scale Online Ranker Evaluation , 2015, WSDM.

[7]  Zheng Wen,et al.  Combinatorial Cascading Bandits , 2015, NIPS.

[8]  M. de Rijke,et al.  Relative confidence sampling for efficient on-line ranker evaluation , 2014, WSDM.

[9]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[10]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[11]  Susan T. Dumais Keynote: The Web Changes Everything: Understanding and Supporting People in Dynamic Information Environments , 2010, ECDL.

[12]  Katja Hofmann,et al.  Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.

[13]  Matthew Lease,et al.  Active learning to maximize accuracy vs. effort in interactive information retrieval , 2011, SIGIR.

[14]  Katja Hofmann,et al.  Lerot: an online learning to rank framework , 2013, LivingLab '13.

[15]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[16]  Eyke Hüllermeier,et al.  A Survey of Preference-Based Online Learning with Bandit Algorithms , 2014, ALT.

[17]  Ron Kohavi,et al.  Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[18]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[19]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[20]  Gleb Gusev,et al.  Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking , 2015, WWW.

[21]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[22]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[23]  M. de Rijke,et al.  Click-based Hot Fixes for Underperforming Torso Queries , 2016, SIGIR.

[24]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[25]  Katja Hofmann,et al.  "Learning to rank for information retrieval from user interactions" by K. Hofmann, S. Whiteson, A. Schuth, and M. de Rijke with Martin Vesely as coordinator , 2014, SIGWEB Newsl..

[26]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[27]  M. de Rijke,et al.  Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.

[28]  M. de Rijke,et al.  Using Metafeatures to Increase the Effectiveness of Latent Semantic Models in Web Search , 2016, WWW.

[29]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[30]  M. de Rijke,et al.  Copeland Dueling Bandits , 2015, NIPS.

[31]  M. de Rijke,et al.  Bayesian Ranker Comparison Based on Historical User Interactions , 2015, SIGIR.

[32]  Katja Hofmann,et al.  Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.

[33]  Jason L. Loeppky,et al.  A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit , 2015, ArXiv.

[34]  M. de Rijke,et al.  Online Exploration for Detecting Shifts in Fresh Intent , 2014, CIKM.

[35]  Krisztian Balog,et al.  Extended Overview of the Living Labs for Information Retrieval Evaluation (LL4IR) CLEF Lab 2015 , 2015, CLEF.

[36]  Maarten de Rijke,et al.  Probabilistic Multileave Gradient Descent , 2016, ECIR.

[37]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[38]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.