暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[3] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] Feng Qi (祁锋),et al. COMPLETELY MONOTONIC FUNCTION ASSOCIATED WITH THE GAMMA FUNCTIONS AND PROOF OF WALLIS' INEQUALITY , 2005 .
[6] Constantin P. Niculescu,et al. Convex Functions and Their Applications: A Contemporary Approach , 2005 .
[7] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[8] Csaba Szepesvári,et al. Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems , 2011, ArXiv.
[9] S. Li. Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .
[10] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[11] Laurence B. Milstein,et al. Chernoff-Type Bounds for the Gaussian Error Function , 2011, IEEE Transactions on Communications.
[12] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[13] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[14] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[15] David S. Leslie,et al. Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..
[16] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[17] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[18] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[19] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[20] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[21] Ambuj Tewari,et al. Online Linear Optimization via Smoothing , 2014, COLT.
[22] Sébastien Bubeck,et al. Prior-free and prior-dependent regret bounds for Thompson Sampling , 2013, 2014 48th Annual Conference on Information Sciences and Systems (CISS).
[23] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[24] Branislav Kveton,et al. Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.
[25] Ambuj Tewari,et al. Fighting Bandits with a New Kind of Smoothness , 2015, NIPS.
[26] Benjamin Van Roy,et al. Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.
[27] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..