The K-armed Dueling Bandits Problem

[1]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[2]  Pinar Donmez,et al.  On the local optimality of LambdaRank , 2009, SIGIR.

[3]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[4]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[5]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[6]  Avinatan Hassidim,et al.  The Bayesian Learner is Optimal for Noisy Binary Search  (and Pretty Good for Quantum as Well) , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[7]  Mehryar Mohri,et al.  An Efficient Reduction of Ranking to Classification , 2007, COLT.

[8]  Rocco A. Servedio,et al.  Boosting the Area under the ROC Curve , 2007, NIPS.

[9]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[10]  Richard M. Karp,et al.  Noisy binary search and its applications , 2007, SODA '07.

[11]  Deepayan Chakrabarti,et al.  Bandits for Taxonomies: A Model-based Approach , 2007, SDM.

[12]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[13]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[14]  Anonymous Author Robust Reductions from Ranking to Classification , 2006 .

[15]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[18]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[19]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[21]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[22]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[23]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[24]  Eli Upfal,et al.  Computing with Noisy Information , 1994, SIAM J. Comput..

[25]  Claire Mathieu,et al.  Selection in the presence of noise: the design of playoff systems , 1994, SODA '94.

[26]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[29]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .