The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
暂无分享,去创建一个
[1] J. Gani,et al. Progress in statistics , 1975 .
[2] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[3] I. Johnstone,et al. ASYMPTOTICALLY OPTIMAL PROCEDURES FOR SEQUENTIAL ADAPTIVE SELECTION OF THE BEST OF SEVERAL NORMAL MEANS , 1982 .
[4] D. Siegmund. Sequential Analysis: Tests and Confidence Intervals , 1985 .
[5] H. Chernoff. Sequential Analysis and Optimal Design , 1987 .
[6] S. Gupta,et al. Statistical decision theory and related topics IV , 1988 .
[7] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[8] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .
[9] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[10] Sanjeev R. Kulkarni,et al. Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..
[11] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[12] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[13] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[14] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[15] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .