Further Optimal Regret Bounds for Thompson Sampling
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] Milton Abramowitz,et al. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .
[3] D. Owen. Handbook of Mathematical Functions with Formulas , 1965 .
[4] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[5] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[6] Jeremy Wyatt,et al. Exploration and inference in learning from reinforcement , 1998 .
[7] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[8] Costas Courcoubetis,et al. Pricing Communication Networks: Economics, Technology and Modelling (Wiley Interscience Series in Systems and Optimization) , 2003 .
[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[10] Emil Jerábek,et al. Dual weak pigeonhole principle, Boolean complexity, and derandomization , 2004, Annals of Pure and Applied Logic.
[11] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[12] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[13] Ole-Christoffer Granmo,et al. Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton , 2010, Int. J. Intell. Comput. Cybern..
[14] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[15] Benedict C. May. Simulation Studies in Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2011 .
[16] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[17] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[18] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[19] Rémi Munos,et al. Thompson Sampling: An Optimal Finite Time Analysis , 2012, ArXiv.
[20] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[21] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[22] David S. Leslie,et al. Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..
[23] Lihong Li,et al. Open Problem: Regret Bounds for Thompson Sampling , 2012, COLT.
[24] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[25] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[26] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .