A Near Optimal Policy for Channel Allocation in Cognitive Radio

Several tasks of interest in digital communications can be cast into the framework of planning in Partially Observable Markov Decision Processes (POMDP). In this contribution, we consider a previously proposed model for a channel allocation task and develop an approach to compute a near optimal policy. The proposed method is based on approximate (point based) value iteration in a continuous state Markov Decision Process (MDP) which uses a specific internal state as well as an original discretization scheme for the internal points. The obtained results provide interesting insights into the behavior of the optimal policy in the channel allocation model.

[1]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[2]  E. Feron,et al.  Multi-UAV dynamic routing with partial observations using restless bandit allocation indices , 2008, 2008 American Control Conference.

[3]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[4]  Blai Bonet,et al.  An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes , 2002, ICML.

[5]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[6]  John N. Tsitsiklis,et al.  The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.

[7]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[8]  Blai Bonet An 2-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes , 2002 .

[9]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[10]  Q. Zhao,et al.  Decentralized cognitive mac for dynamic spectrum access , 2005, First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005..

[11]  Sudipto Guha,et al.  Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[12]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[13]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[14]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[15]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[16]  Qi Cheng,et al.  Derandomization of Sparse Cyclotomic Integer Zero Testing , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[17]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.