Optimal index rules for single resource allocation to stochastic dynamic competitors

In this paper we present a generic Markov decision process model of optimal single resource allocation to a collection of stochastic dynamic competitors. The main goal is to identify sufficient conditions under which this problem is optimally solved by an index rule. The main focus is on the frozen-if-not-allocated assumption, which is notoriously found in problems including the multi-armed bandit problem, tax problem, Klimov network, job sequencing, object search and detection. The problem is approached by a Lagrangian relaxation and decomposed into a collection of normalized parametric single-competitor subproblems, which are then optimally solved by the well-known Gittins index. We show that the problem is equivalent to solving a time sequence of its Lagrangian relaxations. We further show that our approach gives insights on sufficient conditions for optimality of index rules in restless problems (in which the frozen-if-not-allocated assumption is dropped) with single resource; this paper is the first to prove such conditions.

[1]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[2]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[3]  José Niño-Mora,et al.  A (2/3)n3 Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain , 2007, INFORMS J. Comput..

[4]  Jean Walrand,et al.  The c# rule revisited , 1985 .

[5]  R. Weber,et al.  On an index policy for restless bandits , 1990, Journal of Applied Probability.

[6]  Daniel Adelman,et al.  Relaxations of Weakly Coupled Stochastic Dynamic Programs , 2008, Oper. Res..

[7]  P. Jacko Marginal productivity index policies for dynamic priority allocation in restless bandit models , 2011 .

[8]  Mingyan Liu,et al.  Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.

[9]  Peter Whittle,et al.  Applied Probability in Great Britain , 2002, Oper. Res..

[10]  P. Whittle Tax problems in the undiscounted case , 2005 .

[11]  José Niño Mora Restless Bandits, Partial Conservation Laws and Indexability , 2000 .

[12]  José Niño-Mora,et al.  Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.

[13]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[14]  R. N. Bradt,et al.  On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .

[15]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16]  G. Klimov Time-Sharing Service Systems. I , 1975 .

[17]  Peter Jacko Adaptive Greedy Rules for Dynamic and Stochastic Resource Capacity Allocation Problems , 2010 .

[18]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[19]  Demosthenis Teneketzis,et al.  On the optimality of the Gittins index rule for multi-armed bandits with multiple plays , 1995, Math. Methods Oper. Res..

[20]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[21]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[22]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[23]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .

[24]  P. Jacko,et al.  Congestion Avoidance with Future-Path Information , 2007 .

[25]  P. Whittle Arm-Acquiring Bandits , 1981 .

[26]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[27]  José Niòo-Mora A (2/3)n3 Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain , 2007 .

[28]  D. Berry A Bernoulli Two-armed Bandit , 1972 .

[29]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[30]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[31]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .