Simulation Studies of Multi-armed Bandits with Covariates (Invited Paper)
暂无分享,去创建一个
[1] Q. Stout,et al. Bandit Strategies for Ethical Sequential Allocation , 1992 .
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[4] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[5] Yuhong Yang,et al. RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES , 2002 .
[6] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[7] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[8] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.
[9] David H. Wolpert,et al. Bandit problems and the exploration/exploitation tradeoff , 1998, IEEE Trans. Evol. Comput..
[10] Malik Beshir Malik,et al. Applied Linear Regression , 2005, Technometrics.