Online algorithms for the multi-armed bandit problem with Markovian rewards
暂无分享,去创建一个
Mingyan Liu | Cem Tekin | M. Liu | Cem Tekin
[1] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[2] P. Lezaud. Chernoff-type bound for finite Markov chains , 1998 .
[3] ZhaoQing,et al. Distributed learning in multi-armed bandit with multiple players , 2010 .
[4] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[5] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[7] Ao Tang,et al. Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.
[8] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[9] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[10] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[11] Qing Zhao,et al. Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.