Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
暂无分享,去创建一个
John Langford | Lihong Li | Robert E. Schapire | Alekh Agarwal | Satyen Kale | Daniel J. Hsu | R. Schapire | J. Langford | Lihong Li | Satyen Kale | Alekh Agarwal
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] Robert E. Schapire,et al. Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.
[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[4] John Langford,et al. Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.
[5] Vladimir Vovk,et al. Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme , 2001, Theor. Comput. Sci..
[6] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[7] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[8] Jerry Alan Fails,et al. Interactive machine learning , 2003, IUI '03.
[9] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[12] Matthew J. Streeter,et al. Tighter Bounds for Multi-Armed Bandits with Expert Advice , 2009, COLT.
[13] John Langford,et al. The offset tree for learning with partial labels , 2008, KDD.
[14] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[15] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[16] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[17] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[18] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[19] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[20] Lihong Li,et al. Generalized Thompson Sampling for Contextual Bandits , 2013, ArXiv.