A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
暂无分享,去创建一个
Rémi Munos | Gilles Stoltz | Odalric-Ambrym Maillard | R. Munos | Odalric-Ambrym Maillard | Gilles Stoltz
[1] Michel Loève,et al. Probability Theory I , 1977 .
[2] I. H. Dinwoodie. Mesures dominantes et théorème de Sanov , 1992 .
[3] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[6] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[7] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.
[8] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[9] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[10] Sarah Filippi. Stratégies optimistes en apprentissage par renforcement , 2010 .
[11] Aurélien Garivier,et al. Context tree selection: A unifying view , 2010, 1011.2424.
[12] Akimichi Takemura,et al. An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.
[13] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[14] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .