On the asymptotic optimality of greedy index heuristics for multi-action restless bandits

The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. This paper presents an optimality result which extends that of Weber and Weiss (1990) for restless bandits to a more general setting in which individual bandits have multiple levels of activation but are subject to an overall resource constraint. The contribution is motivated by the recent works of Glazebrook et al. (2011a), (2011b) who discussed the performance of index heuristics for resource allocation in such systems. Hitherto, index heuristics have been shown, under a condition of full indexability, to be optimal for a natural Lagrangian relaxation of such problems in which a resource is purchased rather than constrained. We find that under key assumptions about the nature of solutions to a deterministic differential equation that the index heuristics above are asymptotically optimal in a sense described by Whittle. We then demonstrate that these assumptions always hold for three-state bandits.

[1]  K. Glazebrook,et al.  General notions of indexability for queueing control and asset management , 2011, 1211.1775.

[2]  Urtzi Ayesta,et al.  A nearly-optimal index rule for scheduling of users with abandonment , 2011, 2011 Proceedings IEEE INFOCOM.

[3]  Kevin D. Glazebrook,et al.  Dynamic resource allocation in a multi-product make-to-stock production system , 2011, Queueing Syst. Theory Appl..

[4]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[5]  Kevin D. Glazebrook,et al.  Index Policies for the Admission Control and Routing of Impatient Customers to Heterogeneous Service Stations , 2009, Oper. Res..

[6]  Kevin D. Glazebrook,et al.  Indexability and Index Heuristics for a Simple Class of Inventory Routing Problems , 2009, Oper. Res..

[7]  José Niño-Mora,et al.  Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.

[8]  Richard Weber,et al.  Comments on: Dynamic priority allocation via restless bandit marginal productivity indices , 2007 .

[9]  Onno Boxma,et al.  Comments on: Dynamic priority allocation via restless bandit marginal productivity indices , 2007 .

[10]  Felipe Caro,et al.  Dynamic Assortment with Demand Learning for Seasonal Consumer Goods , 2007, Manag. Sci..

[11]  Kevin D. Glazebrook,et al.  Index policies for the maintenance of a collection of machines by a set of repairmen , 2005, Eur. J. Oper. Res..

[12]  Vidyadhar G. Kulkarni,et al.  Outsourcing warranty repairs: Dynamic allocation , 2005 .

[13]  R. R. Lumley,et al.  On the optimal allocation of service to impatient tasks , 2004, Journal of Applied Probability.

[14]  K. Glazebrook,et al.  Index policies for a class of discounted restless bandits , 2002, Advances in Applied Probability.

[15]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[16]  Lawrence M. Wein,et al.  Scheduling a Make-To-Stock Queue: Index Policies and Hedging Points , 1996, Oper. Res..

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  R. Weber,et al.  Addendum to ‘On an index policy for restless bandits' , 1991, Advances in Applied Probability.

[19]  R. Weber,et al.  On an index policy for restless bandits , 1990, Journal of Applied Probability.

[20]  Alan Weiss,et al.  A closed network with a discriminatory processor-sharing server , 1989, SIGMETRICS '89.

[21]  Debasis Mitra,et al.  A transient analysis of a data network with a processor-sharing switch , 1988, AT&T Technical Journal.

[22]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[23]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[24]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[25]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .