A Novel Implementation of Q-Learning for the Whittle Index

[1]  Urtzi Ayesta,et al.  Scheduling of multi-class multi-server queueing systems with abandonments , 2017, J. Sched..

[2]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[3]  Peter Jacko,et al.  Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic , 2016, Ann. Oper. Res..

[4]  A. Burnetas,et al.  Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[5]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[6]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[7]  Kevin D. Glazebrook,et al.  A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements , 2009, Math. Oper. Res..

[8]  K. Glazebrook,et al.  On the asymptotic optimality of greedy index heuristics for multi-action restless bandits , 2015, Advances in Applied Probability.

[9]  R. Weber,et al.  ON AN INDEX POLICY FOR RESTLESS BANDITS , 1990 .

[10]  Jack Bowden,et al.  Response‐adaptive randomization for multi‐arm clinical trials using the forward looking Gittins index rule , 2015, Biometrics.

[11]  Kevin D. Glazebrook,et al.  Index Policies for Shooting Problems , 2007, Oper. Res..

[12]  José Niño-Mora,et al.  Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.

[13]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[14]  Peter Jacko,et al.  Generalized Restless Bandits and the Knapsack Problem for Perishable Inventories , 2014, Oper. Res..

[15]  Apostolos Burnetas,et al.  Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..

[16]  Kevin D. Glazebrook,et al.  Index Policies for the Admission Control and Routing of Impatient Customers to Heterogeneous Service Stations , 2009, Oper. Res..

[17]  John R. Hauser,et al.  Website Morphing , 2009, Mark. Sci..

[18]  Peter Auer,et al.  Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..

[19]  Kevin D. Glazebrook,et al.  Stochastic scheduling: A short history of index policies and new approaches to index generation for dynamic resource allocation , 2014, J. Sched..

[20]  Urtzi Ayesta,et al.  A modeling framework for optimizing the flow-level scheduling with time-varying channels , 2010, Perform. Evaluation.

[21]  Kevin D. Glazebrook,et al.  Indexability and Index Heuristics for a Simple Class of Inventory Routing Problems , 2009, Oper. Res..

[22]  M. Mohri,et al.  Bandit Problems , 2006 .