On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards
暂无分享,去创建一个
[1] Qing Zhao,et al. Decentralized multi-armed bandit with multiple distributed players , 2010, 2010 Information Theory and Applications Workshop (ITA).
[2] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[3] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] Mingyan Liu,et al. Online learning in opportunistic spectrum access: A restless bandit approach , 2010, 2011 Proceedings IEEE INFOCOM.
[6] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[7] Deepayan Chakrabarti,et al. Multi-armed bandit problems with dependent arms , 2007, ICML '07.
[8] Ao Tang,et al. Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.
[9] Neil Genzlinger. A. and Q , 2006 .
[10] Qing Zhao,et al. Logarithmic weak regret of non-Bayesian restless multi-armed bandit , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] W. Marsden. I and J , 2012 .
[12] Mingyan Liu,et al. Online algorithms for the multi-armed bandit problem with Markovian rewards , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[13] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[14] Yi Gai,et al. Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).
[15] Wenhan Dai,et al. The non-Bayesian restless multi-armed bandit: A case of near-logarithmic regret , 2010, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..