Multi-policy posterior sampling for restless Markov bandits
暂无分享,去创建一个
[1] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[2] Mingyan Liu,et al. Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.
[3] Peng Shi,et al. Approximation algorithms for restless bandit problems , 2007, JACM.
[4] John N. Tsitsiklis,et al. The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..
[5] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[6] Bhaskar Krishnamachari,et al. On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance , 2007, IEEE Transactions on Wireless Communications.
[7] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[8] Mingyan Liu,et al. Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.
[9] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .