Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems

We show that if performance measures in stochastic and dynamic scheduling problems satisfy generalized conservation laws, then the feasible region of achievable performance is a polyhedron called an extended polymatroid, that generalizes the classical polymatroids introduced by Edmonds. Optimization of a linear objective over an extended polymatroid is solved by an adaptive greedy algorithm, which leads to an optimal solution having an indexability property indexable systems. Under a certain condition the indices possess a stronger decomposition property decomposable systems. The following problems can be analyzed using our theory: multiarmed bandit problems, branching bandits, scheduling of multiclass queues with or without feedback, scheduling of a batch of jobs. Consequences of our results include: 1 a characterization of indexable systems as systems that satisfy generalized conservation laws, 2 a sufficient condition for indexable systems to be decomposable, 3 a new linear programming proof of the decomposability property of Gittins indices in multiarmed bandit problems, 4 an approach to sensitivity analysis of indexable systems, 5 a characterization of the indices of indexable systems as sums of dual variables, and an economic interpretation of the branching bandit indices in terms of retirement options, 6 an analysis of the indexability of undiscounted branching bandits, 7 a new algorithm to compute the indices of indexable systems in particular Gittins indices, as fast as the fastest known algorithm, 8 a unification of Klimov's algorithm for multiclass queues and Gittms' algorithm for multiarmed bandits as special cases of the same algorithm, 9 a closed formula for the maximum reward of the multiarmed bandit problem, with a new proof of its submodularity and 10 an understanding of the invariance of the indices with respect to some parameters of the problem. Our approach provides a polyhedral treatment of several classical problems in stochastic and dynamic scheduling and is able to address variations such as: discounted versus undiscounted cost criterion, rewards versus taxes, discrete versus continuous time, and linear versus nonlinear objective functions.

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Alan Cobham,et al.  Priority Assignment in Waiting Line Problems , 1954, Oper. Res..

[4]  Wayne E. Smith Various optimizers for single‐stage production , 1956 .

[5]  Shaler Stidham,et al.  L = λW: A Discounted Analogue and a New Proof , 1972, Oper. Res..

[6]  G. Klimov Time-Sharing Service Systems. I , 1975 .

[7]  J. Michael Harrison,et al.  A Priority Queue with Discounted Linear Costs , 1975, Oper. Res..

[8]  J. Michael Harrison,et al.  Dynamic Scheduling of a Multiclass Queue: Discount Optimality , 1975, Oper. Res..

[9]  Dong-Wan Tcha,et al.  Optimal Control of Single-Server Queuing Networks and Multi-Class M/G/1 Queues with Feedback , 1977, Oper. Res..

[10]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[11]  Edward G. Coffman,et al.  A Characterization of Waiting Time Performance Realizable by Single-Server Queues , 1980, Oper. Res..

[12]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .

[13]  Erol Gelenbe,et al.  Analysis and Synthesis of Computer Systems , 1980 .

[14]  P. Whittle Arm-Acquiring Bandits , 1981 .

[15]  Peter Whittle,et al.  Optimization Over Time , 1982 .

[16]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[17]  Michael N. Katehakis,et al.  Linear Programming for Finite State Multi-Armed Bandit Problems , 1986, Math. Oper. Res..

[18]  J. Tsitsiklis A lemma on the multiarmed bandit problem , 1986 .

[19]  Kevin D. Glazebrook,et al.  Sensitivity Analysis for Stochastic Scheduling Problems , 1987, Math. Oper. Res..

[20]  Michael N. Katehakis,et al.  The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[21]  A. Federgruen,et al.  M / G / c queueing systems with multiple customer classes: characterization and control of achievable performance under nonpreemptive priority rules , 1988 .

[22]  Gideon Weiss,et al.  Branching Bandit Processes , 1988, Probability in the Engineering and Informational Sciences.

[23]  Awi Federgruen,et al.  Characterization and Optimization of Achievable Performance in General Queueing Systems , 1988, Oper. Res..

[24]  G. Weiss,et al.  Scheduling Stochastic Jobs with a Two-Point Distribution on Two Parallel Machines , 1989, Probability in the Engineering and Informational Sciences.

[25]  David D. Yao,et al.  Optimal dynamic scheduling in Jackson networks , 1989 .

[26]  J. Walrand,et al.  Interchange arguments in stochastic scheduling , 1989 .

[27]  Eugene L. Lawler,et al.  Sequencing and scheduling: algorithms and complexity , 1989 .

[28]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[29]  John N. Tsitsiklis,et al.  Optimization of multiclass queuing networks: polyhedral and nonlinear characterizations of achievable performance , 1994 .

[30]  Leonidas Georgiadis,et al.  Extended Polymatroids: Properties and Optimization , 1992, Conference on Integer Programming and Combinatorial Optimization.

[31]  David D. Yao,et al.  Multiclass Queueing Systems: Polymatroidal Structure and Optimal Scheduling Control , 1992, Oper. Res..

[32]  J. Tsitsiklis A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[33]  Leonidas Georgiadis,et al.  Problems of Adaptive Optimization In Multiclass M/GI/1 Queues with Bernoulli Feedback , 1995, Math. Oper. Res..