Towards Q-learning the Whittle Index for Restless Bandits
暂无分享,去创建一个
Peter G. Taylor | Jing Fu | Yoni Nazarathy | Sarat Moka | P. Taylor | S. Moka | Jing-Zhi Fu | Y. Nazarathy
[1] José Niño-Mora,et al. Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.
[2] Vivek S. Borkar,et al. A reinforcement learning algorithm for restless bandits , 2018, 2018 Indian Control Conference (ICC).
[3] A. V. den Boer,et al. Dynamic Pricing and Learning: Historical Origins, Current Research, and New Directions , 2013 .
[4] Sarang Deo,et al. Improving Health Outcomes Through Better Capacity Allocation in a Community-Based Chronic Care Model , 2013, Oper. Res..
[5] R. Weber,et al. On an index policy for restless bandits , 1990, Journal of Applied Probability.
[6] Moshe Zukerman,et al. Asymptotically Optimal Job Assignment for Energy-Efficient Processor-Sharing Server Farms , 2016, IEEE Journal on Selected Areas in Communications.
[7] I. M. Verloop. Asymptotically optimal priority policies for indexable and nonindexable restless bandits , 2016, 1609.00563.
[8] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[9] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[10] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[11] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[12] Qing Zhao,et al. Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.
[13] John N. Tsitsiklis,et al. The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..
[14] P. Taylor,et al. Restless Bandits in Action: Resource Allocation, Competition and Reservation , 2018 .
[15] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[16] D. Blackwell. Discrete Dynamic Programming , 1962 .