论文信息 - The K-armed Dueling Bandits Problem - 字舞流文

The K-armed Dueling Bandits Problem

Thorsten Joachims | Yisong Yue | Robert D. Kleinberg | Josef Broder | Yisong Yue | T. Joachims | J. Broder

[1] Robert D. Kleinberg,et al. Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[2] Pinar Donmez,et al. On the local optimality of LambdaRank , 2009, SIGIR.

[3] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[4] Thorsten Joachims,et al. Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[5] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[6] Avinatan Hassidim,et al. The Bayesian Learner is Optimal for Noisy Binary Search (and Pretty Good for Quantum as Well) , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[7] Mehryar Mohri,et al. An Efficient Reduction of Ranking to Classification , 2007, COLT.

[8] Rocco A. Servedio,et al. Boosting the Area under the ROC Curve , 2007, NIPS.

[9] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[10] Richard M. Karp,et al. Noisy binary search and its applications , 2007, SODA '07.

[11] Deepayan Chakrabarti,et al. Bandits for Taxonomies: A Model-based Approach , 2007, SDM.

[12] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[13] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[14] Anonymous Author. Robust Reductions from Ranking to Classification , 2006 .

[15] Thorsten Joachims,et al. A support vector method for multivariate performance measures , 2005, ICML.

[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[18] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[19] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20] Klaus Obermayer,et al. Support vector learning for ordinal regression , 1999 .

[21] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[22] Yoram Singer,et al. Learning to Order Things , 1997, NIPS.

[23] Rajeev Motwani,et al. Randomized Algorithms , 1995, SIGA.

[24] Eli Upfal,et al. Computing with Noisy Information , 1994, SIAM J. Comput..

[25] Claire Mathieu,et al. Selection in the presence of noise: the design of playoff systems , 1994, SODA '94.

[26] N. Fisher,et al. Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[27] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[28] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[29] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .