On the computation of Whittle’s index for Markovian restless bandits

The multi-armed restless bandit framework allows to model a wide variety of decision-making problems in areas as diverse as industrial engineering, computer communication, operations research, financial engineering, communication networks etc. In a seminal work, Whittle developed a methodology to derive well-performing (Whittle’s) index policies that are obtained by solving a relaxed version of the original problem. However, the computation of Whittle’s index itself is a difficult problem and hence researchers focused on calculating Whittle’s index numerically or with a problem dependent approach. In our main contribution we derive an analytical expression for Whittle’s index for any Markovian bandit with both finite and infinite transition rates. We derive sufficient conditions for the optimal solution of the relaxed problem to be of threshold type, and obtain conditions for the bandit to be indexable, a property assuring the existence of Whittle’s index. Our solution approach provides a unifying expression for Whittle’s index, which we highlight by retrieving known indices from literature as particular cases. The applicability of finite rates is illustrated with the machine repairmen problem, and that of infinite rates by an example of communication networks where transmission rates react instantaneously to packet losses.

[1]  R. Weber,et al.  ON AN INDEX POLICY FOR RESTLESS BANDITS , 1990 .

[2]  Prioritizing Hepatitis C Treatment in U.S. Prisons , 2019 .

[3]  José Niño-Mora,et al.  Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.

[4]  Michael P. Atkinson,et al.  On the dynamic allocation of assets subject to failure , 2020, Eur. J. Oper. Res..

[5]  Peter Jacko,et al.  Generalized Restless Bandits and the Knapsack Problem for Perishable Inventories , 2014, Oper. Res..

[6]  P. Jacko,et al.  Congestion control of TCP flows in Internet routers by means of index policy , 2012, Comput. Networks.

[7]  F. Dufour,et al.  Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach , 2014, Applied Mathematics & Optimization.

[8]  I. M. Verloop Asymptotically optimal priority policies for indexable and nonindexable restless bandits , 2016, 1609.00563.

[9]  Vivek S. Borkar,et al.  Distributed Server Allocation for Content Delivery Networks , 2017, ArXiv.

[10]  Kevin D. Glazebrook,et al.  Index policies for the maintenance of a collection of machines by a set of repairmen , 2005, Eur. J. Oper. Res..

[11]  Vivek S. Borkar,et al.  Whittle indexability in egalitarian processor sharing systems , 2017, Annals of Operations Research.

[12]  Eitan Altman,et al.  Generalized α-fair resource allocation in wireless networks , 2008, 2008 47th IEEE Conference on Decision and Control.

[13]  Urtzi Ayesta,et al.  Dynamic Control of Birth-and-Death Restless Bandits: Application to Resource-Allocation Problems , 2016, IEEE/ACM Transactions on Networking.

[14]  Vivek S. Borkar,et al.  Opportunistic Scheduling as Restless Bandits , 2017, IEEE Transactions on Control of Network Systems.

[15]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[16]  Gunter Bolch,et al.  Queueing Networks and Markov Chains , 2005 .

[17]  Kevin D. Glazebrook,et al.  Index Policies for the Admission Control and Routing of Impatient Customers to Heterogeneous Service Stations , 2009, Oper. Res..

[18]  Diego Ruiz-Hernández,et al.  Indexable Restless Bandits: Index Policies for Some Families of Stochastic Scheduling and Dynamic Allocation Problems , 2008 .

[19]  Peter Jacko,et al.  Dynamic Priority Allocation in Restless Bandit Models: Designing simple and well-performing rules for dynamic and stochastic resource allocation problems , 2010 .

[20]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[21]  Li Ding,et al.  DYNAMIC ROUTING OF CUSTOMERS WITH GENERAL DELAY COSTS IN A MULTISERVER QUEUING SYSTEM , 2009, Probability in the Engineering and Informational Sciences.

[22]  José Niño-Mora,et al.  Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach , 2002, Math. Program..

[23]  Kevin D. Glazebrook,et al.  Whittle's index policy for a multi-class queueing system with convex holding costs , 2003, Math. Methods Oper. Res..

[24]  Konstantin Avrachenkov,et al.  Impulsive Control for G-AIMD Dynamics with Relaxed and Hard Constraints , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[25]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[26]  Jean C. Walrand,et al.  Fair end-to-end window-based congestion control , 2000, TNET.

[27]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[28]  Mark S. Squillante,et al.  Efficient Content Delivery in the Presence of Impatient Jobs , 2015, 2015 27th International Teletraffic Congress.

[29]  F. Dufour,et al.  Impulsive Control for Continuous-Time Markov Decision Processes: A Linear Programming Approach , 2014, 1402.6106.

[30]  J. Nio-Mora Restless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues , 2006 .

[31]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[32]  Yi Zhang,et al.  On Reducing a Constrained Gradual-Impulsive Control Problem for a Jump Markov Model to a Model with Gradual Control Only , 2020, SIAM J. Control. Optim..

[33]  Viliam Makis,et al.  Group Maintenance: A Restless Bandits Approach , 2019, INFORMS J. Comput..

[34]  Peter Jacko,et al.  Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic , 2016, Ann. Oper. Res..

[35]  Krishnakant V. Saboo,et al.  An index policy for dynamic pricing in cloud computing under price commitments , 2017 .

[36]  Vidyadhar G. Kulkarni,et al.  Outsourcing warranty repairs: Dynamic allocation , 2005 .

[37]  Kevin D. Glazebrook,et al.  Developing Effective Service Policies for Multiclass Queues with Abandonment: Asymptotic Optimality and Approximate Policy Improvement , 2016, INFORMS J. Comput..