论文信息 - An Optimal High Probability Algorithm for the Contextual Bandit Problem - 字舞流文

An Optimal High Probability Algorithm for the Contextual Bandit Problem

John Langford | Lihong Li | Robert E. Schapire | Alina Beygelzimer | Lev Reyzin | R. Schapire | J. Langford | Lihong Li | L. Reyzin | A. Beygelzimer

[1] Matthew J. Streeter,et al. Tighter Bounds for Multi-Armed Bandits with Expert Advice , 2009, COLT.

[2] Jacob D. Abernethy,et al. An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction? , 2009, COLT.

[3] Deepak Agarwal,et al. Online Models for Content Optimization , 2008, NIPS.

[4] Ambuj Tewari,et al. Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[8] Robert E. Schapire,et al. Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.

[9] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[10] D. Freedman. On Tail Probabilities for Martingales , 1975 .

[11] H. Robbins. Some aspects of the sequential design of experiments , 1952 .