Multi-armed bandit problems with dependent arms
暂无分享,去创建一个
Deepayan Chakrabarti | Sandeep Pandey | Deepak Agarwal | Deepayan Chakrabarti | Sandeep Pandey | D. Agarwal
[1] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[2] P. Whittle. Multi‐Armed Bandits and the Gittins Index , 1980 .
[3] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[4] T. Lai,et al. Optimal stopping and dynamic allocation , 1987, Advances in Applied Probability.
[5] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[8] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[9] H. Vincent Poor,et al. Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.
[10] Sanjoy Dasgupta,et al. Coarse sample complexity bounds for active learning , 2005, NIPS.
[11] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[12] Deepayan Chakrabarti,et al. Bandits for Taxonomies: A Model-based Approach , 2007, SDM.
[13] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .