Finite-time Analysis of the Multiarmed Bandit Problem
暂无分享,去创建一个
[1] G. Enderlein. Wilks, S. S.: Mathematical Statistics. J. Wiley and Sons, New York–London 1962; 644 S., 98 s , 1964 .
[2] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[3] Bruce E. Hajek,et al. Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..
[4] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[5] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[6] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[9] J. Neveu,et al. Discrete Parameter Martingales , 1975 .
[10] T. Lai. Asymptotically efficient adaptive control in stochastic regression models , 1986 .
[11] S. Dreyfus,et al. Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .
[12] Wing W. Lowe,et al. Nonparametric bandit methods , 1991 .
[13] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[14] P. Varaiya,et al. Multi-Armed bandit problem revisited , 1994 .
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Michael O. Duff,et al. Q-Learning for Bandit Problems , 1995, ICML.
[17] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[18] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[19] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[20] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[21] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[22] D. Pollard. Convergence of stochastic processes , 1984 .
[23] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .