论文信息 - A Novel Implementation of Q-Learning for the Whittle Index - 字舞流文

A Novel Implementation of Q-Learning for the Whittle Index

Yoni Nazarathy | Peter Jacko | Lachlan J. Gibson | P. Jacko | Y. Nazarathy

[1] Urtzi Ayesta,et al. Scheduling of multi-class multi-server queueing systems with abandonments , 2017, J. Sched..

[2] John N. Tsitsiklis,et al. The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[3] Peter Jacko,et al. Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic , 2016, Ann. Oper. Res..

[4] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[5] Kevin D. Glazebrook,et al. Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[6] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[7] Kevin D. Glazebrook,et al. A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements , 2009, Math. Oper. Res..

[8] K. Glazebrook,et al. On the asymptotic optimality of greedy index heuristics for multi-action restless bandits , 2015, Advances in Applied Probability.

[9] R. Weber,et al. ON AN INDEX POLICY FOR RESTLESS BANDITS , 1990 .

[10] Jack Bowden,et al. Response‐adaptive randomization for multi‐arm clinical trials using the forward looking Gittins index rule , 2015, Biometrics.

[11] Kevin D. Glazebrook,et al. Index Policies for Shooting Problems , 2007, Oper. Res..

[12] José Niño-Mora,et al. Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.

[13] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[14] Peter Jacko,et al. Generalized Restless Bandits and the Knapsack Problem for Perishable Inventories , 2014, Oper. Res..

[15] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..

[16] Kevin D. Glazebrook,et al. Index Policies for the Admission Control and Routing of Impatient Customers to Heterogeneous Service Stations , 2009, Oper. Res..

[17] John R. Hauser,et al. Website Morphing , 2009, Mark. Sci..

[18] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..

[19] Kevin D. Glazebrook,et al. Stochastic scheduling: A short history of index policies and new approaches to index generation for dynamic resource allocation , 2014, J. Sched..

[20] Urtzi Ayesta,et al. A modeling framework for optimizing the flow-level scheduling with time-varying channels , 2010, Perform. Evaluation.

[21] Kevin D. Glazebrook,et al. Indexability and Index Heuristics for a Simple Class of Inventory Routing Problems , 2009, Oper. Res..

[22] M. Mohri,et al. Bandit Problems , 2006 .