Optimal, heuristic and Q-learning based DSA policies for cellular networks with coordinated access band

Due to the increasing demands for higher data rate applications, also due to the actual spectrum crowd situation, Dynamic Spectrum Access (DSA) turned into an active research topic. In this paper, we analyse DSA in cellular networks context, where a Coordinated Access Band (CAB) is shared between Radio Access Networks (RANs). We propose a Semi-Markov Decision Process (SMDP) approach to derive the optimal DSA policies in terms of operator reward. In order to overcome the limitations induced by optimal policy implementation, we also propose two simple, though sub-optimal, DSA algorithms: a Q-learning (QL) based algorithm and a heuristic algorithm. The achieved reward using the latter is shown to be very close to the optimal case and thus to significantly exceed the reward obtained with Fixed Spectrum Access (FSA). The rewards achieved by using the QL-based algorithm are shown to exceed those obtained by using FSA. Higher rewards and better spectrum utilisation with DSA optimal and heuristic methods are, however, obtained at the price of a reduced average user throughput. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  John M. Chapin,et al.  COGNITIVE RADIOS FOR DYNAMIC SPECTRUM ACCESS - The Path to Market Success for Dynamic Spectrum Access Technology , 2007, IEEE Communications Magazine.

[2]  K. Tsagkaris,et al.  IEEE P1900.4 Standard: Reconfiguration of multi-radio systems , 2008, 2008 IEEE Region 8 International Conference on Computational Technologies in Electrical and Electronics Engineering.

[3]  Prasad Tadepalli,et al.  Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..

[4]  Ananthram Swami,et al.  Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework , 2007, IEEE Journal on Selected Areas in Communications.

[5]  Ping Zhang,et al.  A Cell Based Dynamic Spectrum Management Scheme with Interference Mitigation for Cognitive Networks , 2008, VTC Spring 2008 - IEEE Vehicular Technology Conference.

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Hiroshi Harada,et al.  Dynamic Spectrum Assignment and Access Scenarios, System Architecture, Functional Architecture and Procedures for IEEE P1900.4 Management System , 2008, 2008 3rd International Conference on Cognitive Radio Oriented Wireless Networks and Communications (CrownCom 2008).

[8]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[9]  Brian M. Sadler,et al.  Cognitive Medium Access: A Protocol for Enhancing Coexistence in WLAN Bands , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[10]  R.D. Yates,et al.  A price based dynamic spectrum allocation scheme , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[11]  Rahim Tafazolli,et al.  Darwinian approach for dynamic spectrum allocation in next generation systems , 2008, IET Commun..

[12]  SwamiAnanthram,et al.  Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks , 2007 .

[13]  M.M. Buddhikot,et al.  Understanding Dynamic Spectrum Access: Models,Taxonomy and Challenges , 2007, 2007 2nd IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks.

[14]  Abhijit Gosavi,et al.  Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..

[15]  Marceau Coupechoux,et al.  SMDP approach for JRRM analysis in heterogeneous networks , 2008, 2008 14th European Wireless Conference.

[16]  Pierre Yves Glorennec,et al.  Reinforcement Learning: an Overview , 2000 .

[17]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[18]  X. Lagrange,et al.  User satisfaction models and scheduling algorithms for packet-switched services in UMTS , 2003, The 57th IEEE Semiannual Vehicular Technology Conference, 2003. VTC 2003-Spring..

[19]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[20]  Milind M. Buddhikot,et al.  DIMSUMnet: new directions in wireless networking using coordinated dynamic spectrum , 2005, Sixth IEEE International Symposium on a World of Wireless Mobile and Multimedia Networks.