Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
暂无分享,去创建一个
Csaba Szepesvári | Rémi Munos | Jean-Yves Audibert | Csaba Szepesvari | R. Munos | Jean-Yves Audibert
[1] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.
[2] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[3] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[6] S. Yakowitz,et al. Machine learning and nonparametric bandit theory , 1995, IEEE Trans. Autom. Control..
[7] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[8] D. Freedman. On Tail Probabilities for Martingales , 1975 .
[9] W. Hoeffding. Probability inequalities for sum of bounded random variables , 1963 .
[10] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[11] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[12] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .