Adaptive transmission scheduling over fading channels for energy-efficient cognitive radio networks by reinforcement learning

In this paper, we address a cross-layer issue of long-term average utility maximization in energy-efficient cognitive radio networks supporting packetized data traffic under the constraint of collision rate with licensed users. Utility is determined by the number of packets transmitted successfully per consumed power and buffer occupancy. We formulate the problem by dynamic programming method namely constrained Markov decision process (CMDP). Reinforcement learning (RL) approach is employed to finding a near-optimal policy under undiscovered environment. The policy learned by RL can guide transmitter to access available channels and select proper transmission rate at the beginning of each frame for its long-term optimal goals. Some implement problems of the RL approach are discussed. Firstly, state space compaction is utilized to cope with so-called curse of dimensionality due to large state space of formulated CMDP. Secondly, action set reduction is presented to reduce the number of actions for some system states. Finally, the CMDP is converted to a corresponding unconstrained Markov decision process (UMDP) by Lagrangian multiplier approach and a golden section search method is proposed to find the proper multiplier. In order to evaluate the performance of the policy learned by RL, we present two naive policies and compare them by simulations.

[1]  L. Baxter AVAILABILITY MEASURES FOR A TWO-STATE SYSTEM , 1981 .

[2]  Simon Haykin,et al.  A dynamic channel assignment policy through Q-learning , 1999, IEEE Trans. Neural Networks.

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Andrea J. Goldsmith,et al.  Degrees of freedom in adaptive modulation: a unified view , 2001, IEEE Trans. Commun..

[5]  Vijay K. Bhargava,et al.  Delay limited optimal and suboptimal power and bit loading algorithms for OFDM systems over correlated fading channels , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[6]  Khaled Ben Letaief,et al.  Multiuser OFDM with adaptive subcarrier, bit, and power allocation , 1999, IEEE J. Sel. Areas Commun..

[7]  Qusay H. Mahmoud,et al.  Cognitive Networks: Towards Self-Aware Networks , 2007 .

[8]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[9]  Gunter Bolch,et al.  Queueing Networks and Markov Chains - Modeling and Performance Evaluation with Computer Science Applications, Second Edition , 1998 .

[10]  K. J. Ray Liu,et al.  Near-optimal reinforcement learning framework for energy-aware sensor communications , 2005, IEEE Journal on Selected Areas in Communications.

[11]  R. Barlow,et al.  Reliability Analysis of a One-Unit System , 1961 .

[12]  Vijay K. Bhargava,et al.  POMDP-Based Coding Rate Adaptation for Type-I Hybrid ARQ Systems over Fading Channels with Memory , 2006, IEEE Transactions on Wireless Communications.

[13]  Victor C. M. Leung,et al.  Efficient QoS Provisioning for Adaptive Multimedia in Mobile Communication Networks by Reinforcement Learning , 2004, First International Conference on Broadband Networks.

[14]  Hong Shen Wang,et al.  Finite-state Markov channel-a useful model for radio communication channels , 1995 .

[15]  Chung-Ju Chang,et al.  Fuzzy/neural congestion control for integrated voice and data DS-CDMA/FRMA cellular networks , 2000, IEEE Journal on Selected Areas in Communications.

[16]  Kang G. Shin,et al.  Efficient Discovery of Spectrum Opportunities with MAC-Layer Sensing in Cognitive Radio Networks , 2008, IEEE Transactions on Mobile Computing.

[17]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[18]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[19]  Qing Zhao,et al.  Distributed Cognitive MAC for Energy-Constrained Opportunistic Spectrum Access , 2006, MILCOM 2006 - 2006 IEEE Military Communications conference.

[20]  Ian F. Akyildiz,et al.  NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey , 2006, Comput. Networks.

[21]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[22]  Gunter Bolch,et al.  Queueing Networks and Markov Chains , 2005 .

[23]  Victor C. M. Leung,et al.  A novel dynamic cell configuration scheme in next-generation situation-aware CDMA networks , 2005, 2005 IEEE 61st Vehicular Technology Conference.

[24]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[25]  Vijay K. Bhargava,et al.  Optimal and suboptimal packet scheduling over correlated time varying flat fading channels , 2006, IEEE Transactions on Wireless Communications.

[26]  Ananthram Swami,et al.  Bursty Traffic in Energy-Constrained Opportunistic Spectrum Access , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[27]  Timothy X. Brown,et al.  Adaptive call admission control under quality of service constraints: a reinforcement learning solution , 2000, IEEE Journal on Selected Areas in Communications.

[28]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[29]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .