Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
暂无分享,去创建一个
[1] R. Ellis,et al. Entropy, large deviations, and statistical mechanics , 1985 .
[2] Patrick Billingsley,et al. Probability and Measure. , 1986 .
[3] L. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .
[4] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[5] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[6] D. Teneketzis,et al. Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .
[7] D. Teneketzis,et al. Asymptotically Efficient Adaptive Allocation Schemes for Controlled I.I.D. Processes: Finite Paramet , 1988 .
[8] R. Agrawal,et al. Certainty equivalence control with forcing: revisited , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[9] R. Agrawal,et al. Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .
[10] R. Agrawal. Adaptive Control of Markov Chains under the Weak Accessibility Condition , 1991 .
[11] R. Agrawal,et al. Multi-armed bandit problems with multiple plays and switching cost , 1990 .
[12] A. Dembo,et al. Large Deviation Techniques and Applications. , 1994 .
[13] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .