A Novel Implementation of Q-Learning for the Whittle Index
暂无分享,去创建一个
[1] Urtzi Ayesta,et al. Scheduling of multi-class multi-server queueing systems with abandonments , 2017, J. Sched..
[2] John N. Tsitsiklis,et al. The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..
[3] Peter Jacko,et al. Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic , 2016, Ann. Oper. Res..
[4] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[5] Kevin D. Glazebrook,et al. Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .
[6] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[7] Kevin D. Glazebrook,et al. A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements , 2009, Math. Oper. Res..
[8] K. Glazebrook,et al. On the asymptotic optimality of greedy index heuristics for multi-action restless bandits , 2015, Advances in Applied Probability.
[9] R. Weber,et al. ON AN INDEX POLICY FOR RESTLESS BANDITS , 1990 .
[10] Jack Bowden,et al. Response‐adaptive randomization for multi‐arm clinical trials using the forward looking Gittins index rule , 2015, Biometrics.
[11] Kevin D. Glazebrook,et al. Index Policies for Shooting Problems , 2007, Oper. Res..
[12] José Niño-Mora,et al. Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.
[13] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[14] Peter Jacko,et al. Generalized Restless Bandits and the Knapsack Problem for Perishable Inventories , 2014, Oper. Res..
[15] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[16] Kevin D. Glazebrook,et al. Index Policies for the Admission Control and Routing of Impatient Customers to Heterogeneous Service Stations , 2009, Oper. Res..
[17] John R. Hauser,et al. Website Morphing , 2009, Mark. Sci..
[18] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..
[19] Kevin D. Glazebrook,et al. Stochastic scheduling: A short history of index policies and new approaches to index generation for dynamic resource allocation , 2014, J. Sched..
[20] Urtzi Ayesta,et al. A modeling framework for optimizing the flow-level scheduling with time-varying channels , 2010, Perform. Evaluation.
[21] Kevin D. Glazebrook,et al. Indexability and Index Heuristics for a Simple Class of Inventory Routing Problems , 2009, Oper. Res..
[22] M. Mohri,et al. Bandit Problems , 2006 .