A Faster Index Algorithm and a Computational Study for Bandits with Switching Costs

W address the intractable multi-armed bandit problem with switching costs, for which an index that partially characterizes optimal policies was introduced (Asawa, M., D. Teneketzis. 1996. Multi-armed bandits with switching penalties. IEEE Trans. Automatic Control 41 328–348), attaching to each project state a “continuation index” (its Gittins index) and a “switching index.” Asawa and Teneketzis proposed to jointly compute both as the Gittins index of a project with 2n states—when the original project has n states—resulting in an eightfold increase in O n3 arithmetic operations relative to those to compute the continuation index. We present a faster decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n2 +O n arithmetic operations, achieving overall a fourfold reduction in arithmetic operations and substantially reduced memory operations. The analysis exploits the fact that the Asawa and Teneketzis index is the marginal productivity index of the project in its restless reformulation, using methods introduced by the author. Extensive computational experiments are reported, which demonstrate the dramatic runtime speedups achieved by the new algorithm, as well as the near optimality of the resultant index policy and its substantial gains against the benchmark Gittins index policy across a wide range of randomly generated twoand three-project instances.

[1]  J. Niño-Mora RESTLESS BANDITS, PARTIAL CONSERVATION LAWS AND INDEXABILITY , 2001 .

[2]  J. Niño-Mora Computing an index policy for bandits with switching penalties , 2007, Valuetools 2007.

[3]  J. Banks,et al.  Switching Costs and the Gittins Index , 1994 .

[4]  José Niño-Mora,et al.  Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.

[5]  Demosthenis Teneketzis,et al.  Multi-armed bandits with switching penalties , 1996, IEEE Trans. Autom. Control..

[6]  José Niño-Mora,et al.  A (2/3)n3 Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain , 2007, INFORMS J. Comput..

[7]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[8]  Lawrence M. Wein,et al.  Dynamic Scheduling of a Two-Class Queue with Setups , 2011, Oper. Res..

[9]  José Niño-Mora,et al.  Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach , 2002, Math. Program..

[10]  D. Teneketzis,et al.  Optimal stochastic scheduling of forest networks with switching penalties , 1994, Advances in Applied Probability.

[11]  DE Economist A SURVEY ON THE BANDIT PROBLEM WITH SWITCHING COSTS , 2004 .

[12]  Max-Olivier Hongler,et al.  Optimal hysteresis for a class of deterministic deteriorating two-armed Bandit problem with switching costs , 2003, Autom..

[13]  J. Nio-Mora Restless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues , 2006 .

[14]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[15]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[16]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .