Stochastic Power Adaptation with Multiagent Reinforcement Learning for Cognitive Wireless Mesh Networks

As the scarce spectrum resource is becoming overcrowded, cognitive radio indicates great flexibility to improve the spectrum efficiency by opportunistically accessing the authorized frequency bands. One of the critical challenges for operating such radios in a network is how to efficiently allocate transmission powers and frequency resource among the secondary users (SUs) while satisfying the quality-of-service constraints of the primary users. In this paper, we focus on the noncooperative power allocation problem in cognitive wireless mesh networks formed by a number of clusters with the consideration of energy efficiency. Due to the SUs' dynamic and spontaneous properties, the problem is modeled as a stochastic learning process. We first extend the single-agent Q-learning to a multiuser context, and then propose a conjecture-based multiagent Q-learning algorithm to achieve the optimal transmission strategies with only private and incomplete information. An intelligent SU performs Q-function updates based on the conjecture over the other SUs' stochastic behaviors. This learning algorithm provably converges given certain restrictions that arise during the learning procedure. Simulation experiments are used to verify the performance of our algorithm and demonstrate its effectiveness of improving the energy efficiency.

[1]  M. Tidball,et al.  Adapting behaviors through a learning process , 2006 .

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Yuan Wu,et al.  Distributed Power Allocation Algorithm for Spectrum Sharing Cognitive Radio Networks with QoS Guarantee , 2009, IEEE INFOCOM 2009.

[4]  Mihaela van der Schaar,et al.  Learning to Compete for Resources in Wireless Stochastic Games , 2009, IEEE Transactions on Vehicular Technology.

[5]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[6]  Hanif D. Sherali,et al.  Optimal Spectrum Sharing for Multi-Hop Software Defined Radio Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[7]  R. Schober,et al.  Distributed Transmit Power Allocation for Relay-Assisted Cognitive-Radio Systems , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[8]  Michael P. Wellman,et al.  Conjectural Equilibrium in Multiagent Learning , 1998, Machine Learning.

[9]  Lutz H.-J. Lampe,et al.  Distributed transmit power allocation for multihop cognitive-radio systems , 2009, IEEE Transactions on Wireless Communications.

[10]  Husheng Li Multiagent Q-Learning for Aloha-Like Spectrum Access in Cognitive Radio Systems , 2010, EURASIP J. Wirel. Commun. Netw..

[11]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[12]  Shuguang Cui,et al.  Price-Based Spectrum Management in Cognitive Radio Networks , 2007, IEEE Journal of Selected Topics in Signal Processing.

[13]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[14]  Yiwei Thomas Hou,et al.  A Distributed Optimization Algorithm for Multi-Hop Cognitive Radio Networks , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[15]  Cem U. Saraydar,et al.  Efficient power control via pricing in wireless data networks , 2002, IEEE Trans. Commun..

[16]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Lijun Qian,et al.  Distributed energy efficient spectrum access in cognitive radio wireless ad hoc networks , 2009, IEEE Transactions on Wireless Communications.

[19]  Ian F. Akyildiz,et al.  CRAHNs: Cognitive radio ad hoc networks , 2009, Ad Hoc Networks.

[20]  Zhi Ding,et al.  Decentralized Cognitive Radio Control Based on Inference from Primary Link Control Information , 2011, IEEE Journal on Selected Areas in Communications.

[21]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[22]  H. Vincent Poor,et al.  A game-theoretic approach to energy-efficient power control in multicarrier CDMA systems , 2006, IEEE Journal on Selected Areas in Communications.

[23]  Aurélien Garivier,et al.  Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds , 2011, IEEE Journal of Selected Topics in Signal Processing.

[24]  H. Vincent Poor,et al.  Reinforcement learning based distributed multiagent sensing policy for cognitive radio networks , 2011, 2011 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN).

[25]  R. Chandramouli,et al.  Stochastic Learning Solution for Distributed Discrete Power Control Game in Wireless Data Networks , 2008, IEEE/ACM Transactions on Networking.

[26]  Ying-Chang Liang,et al.  Power Control and Channel Allocation in Cognitive Radio Networks with Primary Users' Cooperation , 2010, IEEE Transactions on Mobile Computing.

[27]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[28]  Tao Chen,et al.  CogMesh: A Cluster-Based Cognitive Radio Network , 2007, 2007 2nd IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks.

[29]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent {\it Q}-learning with {\&}epsilon;-greedy exploration , 2009, ICML 2009.