Whittle Index for Partially Observed Binary Markov Decision Processes

We consider the problem of dynamically scheduling <inline-formula><tex-math notation="LaTeX">$M$</tex-math> </inline-formula> out of <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> binary Markov chains when only noisy observations of state are available, with ergodic (equivalently, long run average) reward. By passing on to the equivalent problem of controlling the conditional distribution of state given observations and controls, it is cast as a restless bandit problem and its Whittle indexability is established.

[1]  Ramon van Handel,et al.  Discrete time nonlinear filters with informative observations are stable , 2008, 0807.1072.

[2]  Vivek S. Borkar,et al.  Average Cost Dynamic Programming Equations For Controlled Markov Chains With Partial Observations , 2000, SIAM J. Control. Optim..

[3]  W. Fleming,et al.  Optimal Control for Partially Observed Diffusions , 1982 .

[4]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[5]  Daniel H. Wagner Survey of Measurable Selection Theorems , 1977 .

[6]  Atilla Eryilmaz,et al.  Asymptotically optimal downlink scheduling over Markovian fading channels , 2012, 2012 Proceedings IEEE INFOCOM.

[7]  Vivek S. Borkar,et al.  Dynamic Programming for Ergodic Control of Markov Chains under Partial Observations: A Correction , 2007, SIAM J. Control. Optim..

[8]  Mingyan Liu,et al.  Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.

[9]  Vivek S. Borkar,et al.  Structural Properties of Optimal Transmission Policies Over a Randomly Varying Channel , 2008, IEEE Transactions on Automatic Control.

[10]  Paul R. Milgrom,et al.  Envelope Theorems for Arbitrary Choice Sets , 2002 .

[11]  P. Jacko,et al.  Congestion control of TCP flows in Internet routers by means of index policy , 2012, Comput. Networks.

[12]  Ł. Stettner Ergodic control of partially observed Markov processes with equivalent transition probabilities , 1993 .

[13]  V. Borkar Topics in controlled Markov chains , 1991 .

[14]  D. Manjunath,et al.  On the Whittle Index for Restless Multiarmed Hidden Markov Bandits , 2016, IEEE Transactions on Automatic Control.

[15]  E. Feron,et al.  Multi-UAV dynamic routing with partial observations using restless bandit allocation indices , 2008, 2008 American Control Conference.

[16]  R. Weber,et al.  On an index policy for restless bandits , 1990, Journal of Applied Probability.

[17]  L. Stettner,et al.  Approximations of discrete time partially observed control problems , 1994 .

[18]  Roland G. Fryer,et al.  Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability , 2013, Math. Oper. Res..