MergeDTS: A Method for Effective Large-Scale Online Ranker Evaluation

Online ranker evaluation is one of the key challenges in information retrieval. While the preferences of rankers can be inferred by interleaving methods, the problem of how to effectively choose the ranker pair that generates the interleaved list without degrading the user experience too much is still challenging. On the one hand, if two rankers have not been compared enough, the inferred preference can be noisy and inaccurate. On the other, if two rankers are compared too many times, the interleaving process inevitably hurts the user experience too much. This dilemma is known as the exploration versus exploitation tradeoff. It is captured by the $K$-armed dueling bandit problem, which is a variant of the $K$-armed bandit problem, where the feedback comes in the form of pairwise preferences. Today's deployed search systems can evaluate a large number of rankers concurrently, and scaling effectively in the presence of numerous rankers is a critical aspect of $K$-armed dueling bandit problems. In this paper, we focus on solving the large-scale online ranker evaluation problem under the so-called Condorcet assumption, where there exists an optimal ranker that is preferred to all other rankers. We propose Merge Double Thompson Sampling (MergeDTS), which first utilizes a divide-and-conquer strategy that localizes the comparisons carried out by the algorithm to small batches of rankers, and then employs Thompson Sampling (TS) to reduce the comparisons between suboptimal rankers inside these small batches. The effectiveness (regret) and efficiency (time complexity) of MergeDTS are extensively evaluated using examples from the domain of online evaluation for web search. Our main finding is that for large-scale Condorcet ranker evaluation problems, MergeDTS outperforms the state-of-the-art dueling bandit algorithms.

[1]  Josiane Mothe,et al.  Learning to Adaptively Rank Document Retrieval System Configurations , 2018, ACM Trans. Inf. Syst..

[2]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[3]  M. Zoghi Dueling bandits for online ranker evaluation , 2017 .

[4]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[5]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[6]  Peter Bailey,et al.  Incorporating User Expectations and Behavior into the Measurement of Search Effectiveness , 2017, ACM Trans. Inf. Syst..

[7]  Julian Zimmert,et al.  Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits , 2018, J. Mach. Learn. Res..

[8]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[9]  Akimichi Takemura,et al.  An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.

[10]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[11]  Akimichi Takemura,et al.  An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.

[12]  Eyke Hüllermeier,et al.  Preference-based Online Learning with Dueling Bandits: A Survey , 2018, J. Mach. Learn. Res..

[13]  M. de Rijke,et al.  Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.

[14]  Muhammad Ibrahim,et al.  Comparing Pointwise and Listwise Objective Functions for Random-Forest-Based Learning-to-Rank , 2016, ACM Trans. Inf. Syst..

[15]  M. de Rijke,et al.  MergeRUCB: A Method for Large-Scale Online Ranker Evaluation , 2015, WSDM.

[16]  Hiroshi Nakagawa,et al.  Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm , 2016, ICML.

[17]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[18]  M. de Rijke,et al.  Probabilistic Multileave for Online Retrieval Evaluation , 2015, SIGIR.

[19]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[20]  Ingemar J. Cox,et al.  Multi-Dueling Bandits and Their Application to Online Ranker Evaluation , 2016, CIKM.

[21]  M. de Rijke,et al.  Relative confidence sampling for efficient on-line ranker evaluation , 2014, WSDM.

[22]  Thorsten Joachims,et al.  Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.

[23]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[24]  M. de Rijke,et al.  A Comparative Analysis of Interleaving Methods for Aggregated Search , 2015, TOIS.

[25]  Ran Canetti,et al.  Black-Box Concurrent Zero-Knowledge Requires (Almost) Logarithmically Many Rounds , 2002, SIAM J. Comput..

[26]  Huasen Wu,et al.  Double Thompson Sampling for Dueling Bandits , 2016, NIPS.

[27]  Andrew Trotman,et al.  Further Insights on Drawing Sound Conclusions from Noisy Judgments , 2018, ACM Trans. Inf. Syst..

[28]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[29]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[30]  Katja Hofmann,et al.  Lerot: an online learning to rank framework , 2013, LivingLab '13.

[31]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[32]  M. de Rijke,et al.  What Should We Teach in Information Retrieval? , 2019, SIGIR Forum.

[33]  Fabrizio Silvestri,et al.  Post-Learning Optimization of Tree Ensembles for Efficient Ranking , 2016, SIGIR.

[34]  Hiroshi Nakagawa,et al.  Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem , 2015, COLT.

[35]  Fabrice Clérot,et al.  A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits , 2015, ICML.

[36]  Katja Hofmann,et al.  Estimating interleaved comparison outcomes from historical click data , 2012, CIKM '12.

[37]  Ingemar J. Cox,et al.  An Improved Multileaving Algorithm for Online Ranker Evaluation , 2016, SIGIR.

[38]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[39]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[40]  Filip Radlinski,et al.  Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..

[41]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[42]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[43]  Katja Hofmann,et al.  Advancements in Dueling Bandits , 2018, IJCAI.

[44]  Katja Hofmann,et al.  Contextual Dueling Bandits , 2015, COLT.

[45]  M. de Rijke,et al.  BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback , 2018, UAI.

[46]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[47]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[48]  M. de Rijke,et al.  BubbleRank: Safe Online Learning to Rerank , 2018, ArXiv.

[49]  Aditya Gopalan,et al.  Battle of Bandits , 2018, UAI.

[50]  Eyke Hüllermeier,et al.  A Survey of Preference-Based Online Learning with Bandit Algorithms , 2014, ALT.

[51]  Robert D. Nowak,et al.  Sparse Dueling Bandits , 2015, AISTATS.

[52]  Joel W. Burdick,et al.  Multi-dueling Bandits with Dependent Arms , 2017, UAI.

[53]  M. de Rijke,et al.  Copeland Dueling Bandits , 2015, NIPS.

[54]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[55]  Thorsten Joachims,et al.  Beat the Mean Bandit , 2011, ICML.

[56]  Katja Hofmann,et al.  Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.

[57]  Raphaël Féraud,et al.  Generic Exploration and K-armed Voting Bandits , 2013, ICML.