On the optimality of a myopic policy in multi-state channel probing

We consider the channel probing problem arising in opportunistic scheduling over fading channels, cognitive radio networks, and resource constrained jamming. The communication system consists of N channels. Each channel is modeled as a multi-state Markov chain (M.C.). At each time period a user selects one channel to probe and uses it to transmit information. A reward depending on the state of the selected channel is obtained for each transmission. The objective is to design a channel probing policy that maximizes the expected total reward collected over a finite or infinite horizon. This problem can be viewed as an instance of a restless bandit problem, to which the form of optimal policies is unknown in general. We discover conditions sufficient to guarantee the optimality of a myopic probing policy.

[1]  José Niño-Mora,et al.  Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.

[2]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[3]  R. Weber,et al.  On an index policy for restless bandits , 1990, Journal of Applied Probability.

[4]  Bhaskar Krishnamachari,et al.  On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance , 2007, IEEE Transactions on Wireless Communications.

[5]  Brian M. Sadler,et al.  A Survey of Dynamic Spectrum Access , 2007, IEEE Signal Processing Magazine.

[6]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[7]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[8]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[9]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[10]  Ananthram Swami,et al.  Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework , 2007, IEEE Journal on Selected Areas in Communications.

[11]  Peng Shi,et al.  Approximation algorithms for restless bandit problems , 2007, JACM.

[12]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[13]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[14]  Mingyan Liu,et al.  Server allocation with delayed state observation: Sufficient conditions for the optimality of an index policy , 2009, IEEE Transactions on Wireless Communications.

[15]  Mingyan Liu,et al.  Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.

[16]  John N. Tsitsiklis,et al.  The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.

[17]  Mingyan Liu,et al.  Multi-channel opportunistic access: A case of restless bandits with multiple plays , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Demosthenis Teneketzis,et al.  ON THE OPTIMALITY OF AN INDEX RULE IN MULTICHANNEL ALLOCATION FOR SINGLE-HOP MOBILE NETWORKS WITH MULTIPLE SERVICE CLASSES , 2000 .