Reinforcement learning for context awareness and intelligence in wireless networks: Review, new features and open issues

In wireless networks, context awareness and intelligence are capabilities that enable each host to observe, learn, and respond to its complex and dynamic operating environment in an efficient manner. These capabilities contrast with traditional approaches where each host adheres to a predefined set of rules, and responds accordingly. In recent years, context awareness and intelligence have gained tremendous popularity due to the substantial network-wide performance enhancement they have to offer. In this article, we advocate the use of reinforcement learning (RL) to achieve context awareness and intelligence. The RL approach has been applied in a variety of schemes such as routing, resource management and dynamic channel selection in wireless networks. Examples of wireless networks are mobile ad hoc networks, wireless sensor networks, cellular networks and cognitive radio networks. This article presents an overview of classical RL and three extensions, including events, rules and agent interaction and coordination, to wireless networks. We discuss how several wireless network schemes have been approached using RL to provide network performance enhancement, and also open issues associated with this approach. Throughout the paper, discussions are presented in a tutorial manner, and are related to existing work in order to establish a foundation for further research in this field, specifically, for the improvement of the RL approach in the context of wireless networking, for the improvement of the RL approach through the use of the extensions in existing schemes, as well as for the design and implementation of RL in new schemes.

[1]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[2]  David Grace,et al.  Collision reduction in cognitive radio using multichannel 1-persistent CSMA combined with reinforcement learning , 2010, 2010 Proceedings of the Fifth International Conference on Cognitive Radio Oriented Wireless Networks and Communications.

[3]  Yang Yang,et al.  Reinforcement learning based spectrum-aware routing in multi-hop cognitive radio networks , 2009, 2009 4th International Conference on Cognitive Radio Oriented Wireless Networks and Communications.

[4]  Chen-Khong Tham,et al.  Distributed Reinforcement Learning Frameworks for Cooperative Retransmission in Wireless Networks , 2010, IEEE Transactions on Vehicular Technology.

[5]  Yu Zhou,et al.  RL-Based Queue Management for QoS Support in Multi-channel Multi-radio Mesh Networks , 2009, 2009 Eighth IEEE International Symposium on Network Computing and Applications.

[6]  Ana Galindo-Serrano,et al.  Distributed Q-Learning for Aggregated Interference Control in Cognitive Radio Networks , 2010, IEEE Transactions on Vehicular Technology.

[7]  David Grace,et al.  Cognitive radio with reinforcement learning applied to heterogeneous multicast terrestrial communication systems , 2009, 2009 4th International Conference on Cognitive Radio Oriented Wireless Networks and Communications.

[8]  Mohsen Guizani,et al.  Opportunistic Bandwidth Sharing Through Reinforcement Learning , 2010, IEEE Transactions on Vehicular Technology.

[9]  Jim Dowling,et al.  Using feedback in collaborative reinforcement learning to adaptively optimize MANET routing , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  W. Usaha,et al.  Identifying Malicious Nodes in Mobile Ad Hoc Networks using a Reputation Scheme based on Reinforcement Learning , 2006, TENCON 2006 - 2006 IEEE Region 10 Conference.

[11]  Dharma P. Agrawal,et al.  SARA: Stochastic Automata Rate Adaptation for IEEE 802.11 Networks , 2008, IEEE Transactions on Parallel and Distributed Systems.

[12]  Djamal Zeghlache,et al.  A distributed reinforcement learning approach to maximize resource utilization and control handover dropping in multimedia wireless networks , 2002, The 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications.

[13]  Oriol Sallent,et al.  A Novel Framework for Dynamic Spectrum Management in MultiCell OFDMA Networks Based on Reinforcement Learning , 2009, 2009 IEEE Wireless Communications and Networking Conference.

[14]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[15]  Xiaohui Ye,et al.  Cognitive security management with reputation based cooperation schemes in heterogeneous networks , 2009, 2009 IEEE Symposium on Computational Intelligence in Cyber Security.

[16]  Bhaskar Krishnamachari,et al.  Learning-enforced time domain routing to mobile sinks in wireless sensor fields , 2004, 29th Annual IEEE International Conference on Local Computer Networks.

[17]  Chen-Khong Tham,et al.  Cooperative retransmissions using Markov decision process with reinforcement learning , 2009, 2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications.

[18]  Xuedong Liang,et al.  A multi-agent reinforcement learning based routing protocol for wireless sensor networks , 2008, 2008 IEEE International Symposium on Wireless Communication Systems.

[19]  Bo Li,et al.  Non-cooperative power control for wireless ad hoc networks with repeated games , 2007, IEEE Journal on Selected Areas in Communications.

[20]  J. Cid-Sueiro,et al.  Q-Probabilistic Routing in Wireless Sensor Networks , 2007, 2007 3rd International Conference on Intelligent Sensors, Sensor Networks and Information.

[21]  Amy L. Murphy,et al.  CLIQUE: Role-Free Clustering with Q-Learning for Wireless Sensor Networks , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[22]  Jorge Martínez-Bauset,et al.  Optimal admission control in multimedia mobile networks with handover prediction , 2008, IEEE Wireless Communications.

[23]  David Grace,et al.  Improvement of pre-partitioning on reinforcement learning based spectrum sharing , 2009 .

[24]  Kutluyil Dogançay,et al.  Dynamic channel allocation for mobile cellular traffic using reduced-state reinforcement learning , 2004, 2004 IEEE Wireless Communications and Networking Conference (IEEE Cat. No.04TH8733).

[25]  Luciano Bononi,et al.  To Sense or to Transmit: A Learning-Based Spectrum Management Scheme for Cognitive Radiomesh Networks , 2010, 2010 Fifth IEEE Workshop on Wireless Mesh Networks.

[26]  Victor C. M. Leung,et al.  Cooperative Communications with Relay Selection for QoS Provisioning in Wireless Sensor Networks , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[27]  Vincent W. S. Wong,et al.  A Novel Scheduling Algorithm for Video Traffic in High-Rate WPANs , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[28]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[29]  Nicholas Bambos,et al.  A fuzzy reinforcement learning approach to power control in wireless transmitters , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Kok-Lim Alvin Yau,et al.  Enhancing network performance in Distributed Cognitive Radio Networks using single-agent and multi-agent Reinforcement Learning , 2010, IEEE Local Computer Network Conference.

[31]  Ana Galindo-Serrano,et al.  From cognition to docition: The teaching radio paradigm for distributed & autonomous deployments , 2010, Comput. Commun..

[32]  P. Sridhar,et al.  Dynamic power management of an embedded sensor network based on actor-critic reinforcement based learning , 2007, 2007 Third International Conference on Information and Automation for Sustainability.

[33]  Kevin P. Murphy,et al.  A Survey of POMDP Solution Techniques , 2000 .

[34]  David Grace,et al.  Cognitive radio spectrum sharing schemes with reduced spectrum sensing requirements , 2008 .

[35]  Oriol Sallent,et al.  A Novel Approach for Joint Radio Resource Management Based on Fuzzy Neural Methodology , 2008, IEEE Transactions on Vehicular Technology.

[36]  Mihaela van der Schaar,et al.  Online learning in autonomic multi-hop wireless networks for transmitting mission-critical applications , 2010, IEEE Journal on Selected Areas in Communications.

[37]  Lakhmi C. Jain,et al.  Network and information security: A computational intelligence approach: Special Issue of Journal of Network and Computer Applications , 2007, J. Netw. Comput. Appl..

[38]  Krishna M. Sivalingam,et al.  Reinforcement Learning Based Geographic Routing Protocol for UWB Wireless Sensor Network , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[39]  Leslie Pack Kaelbling,et al.  Mobilized ad-hoc networks: a reinforcement learning approach , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[40]  Timothy X. Brown,et al.  Towards autonomous data ferry route design through reinforcement learning , 2008, 2008 International Symposium on a World of Wireless, Mobile and Multimedia Networks.

[41]  Oriol Sallent,et al.  Joint radio resource management for LTE-UMTS coexistence scenarios , 2009, 2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications.

[42]  W. Usaha,et al.  Preventing malicious nodes in ad hoc networks using reinforcement learning , 2005, 2005 2nd International Symposium on Wireless Communication Systems.

[43]  Nada Y. Philip,et al.  Medical QoS provision based on reinforcement learning in ultrasound streaming over 3.5G wireless systems , 2009, IEEE Journal on Selected Areas in Communications.

[44]  Satoshi Fujita,et al.  LQ-routing protocol for mobile ad-hoc networks , 2005, Fourth Annual ACIS International Conference on Computer and Information Science (ICIS'05).

[45]  Xuedong Liang,et al.  A reinforcement learning based routing protocol with QoS support for biomedical sensor networks , 2008, 2008 First International Symposium on Applied Sciences on Biomedical and Communication Technologies.

[46]  K. J. Ray Liu,et al.  Near-optimal reinforcement learning framework for energy-aware sensor communications , 2005, IEEE Journal on Selected Areas in Communications.

[47]  D. Zeghlache,et al.  Adaptive joint call admission control and access network selection for multimedia wireless systems , 2002, The 5th International Symposium on Wireless Personal Multimedia Communications.

[48]  Abdelhamid Mellouk,et al.  Energy and delay efficient state dependent routing algorithm in wireless sensor networks , 2009, 2009 IEEE 34th Conference on Local Computer Networks.

[49]  K. Dogancay,et al.  Reinforcement Learning-Based Dynamic Guard Channel Scheme with Maximum Packing for Cellular Telecommunications Systems , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[50]  P. Chanloha,et al.  Call Admission Control in Wireless DS-CDMA Systems using Actor-Critic Reinforcement Learning , 2007, 2007 2nd International Symposium on Wireless Pervasive Computing.

[51]  A. Forstert,et al.  FROMS: Feedback Routing for Optimizing Multiple Sinks in WSN with Reinforcement Learning , 2007, 2007 3rd International Conference on Intelligent Sensors, Sensor Networks and Information.

[52]  Yunsi Fei,et al.  QELAR: A Machine-Learning-Based Adaptive Routing Protocol for Energy-Efficient and Lifetime-Extended Underwater Sensor Networks , 2010, IEEE Transactions on Mobile Computing.

[53]  Klaus D. McDonald-Maier,et al.  Direct Reinforcement Learning for Autonomous Power Configuration and Control in Wireless Networks , 2009, 2009 NASA/ESA Conference on Adaptive Hardware and Systems.

[54]  Kok-Lim Alvin Yau,et al.  Achieving Efficient and Optimal Joint Action in Distributed Cognitive Radio Networks Using Payoff Propagation , 2010, 2010 IEEE International Conference on Communications.

[55]  Amy L. Murphy,et al.  Exploiting Reinforcement Learning for Multiple Sink Routing in WSNs , 2007, 2007 IEEE Internatonal Conference on Mobile Adhoc and Sensor Systems.

[56]  Zhenzhen Liu,et al.  RL-MAC: A QoS-Aware Reinforcement Learning based MAC Protocol for Wireless Sensor Networks , 2006, 2006 IEEE International Conference on Networking, Sensing and Control.

[57]  Chen-Khong Tham,et al.  Coordinated Sensing Coverage in Sensor Networks using Distributed Reinforcement Learning , 2006, 2006 14th IEEE International Conference on Networks.

[58]  Madhav V. Marathe,et al.  Analyzing the short-term fairness of IEEE 802.11 in wireless multi-hop radio networks , 2002, Proceedings. 10th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems.

[59]  Prashant J. Shenoy,et al.  An adaptive link layer for heterogeneous multi-radio mobile sensor networks , 2010, IEEE Journal on Selected Areas in Communications.

[60]  A. Girotra,et al.  Performance Analysis of the IEEE 802 . 11 Distributed Coordination Function , 2005 .

[61]  Victor C. M. Leung,et al.  Efficient QoS Provisioning for Adaptive Multimedia in Mobile Communication Networks by Reinforcement Learning , 2004, First International Conference on Broadband Networks.

[62]  Farshad Lahouti,et al.  A Decentralized Approach to Network Coding Based on Learning , 2007, 2007 IEEE Information Theory Workshop on Information Theory for Wireless Networks.

[63]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[64]  El-Sayed M. El-Alfy,et al.  A learning approach for prioritized handoff channel allocation in mobile multimedia networks , 2006, IEEE Transactions on Wireless Communications.

[65]  Vivek S. Borkar,et al.  A Stable Online Algorithm for Energy-Efficient Multiuser Scheduling , 2010, IEEE Transactions on Mobile Computing.

[66]  Victor C. M. Leung,et al.  A New QoS Provisioning Method for Adaptive Multimedia in Wireless Networks , 2008, IEEE Transactions on Vehicular Technology.

[67]  Dusit Niyato,et al.  Dynamics of Network Selection in Heterogeneous Wireless Networks: An Evolutionary Game Approach , 2009, IEEE Transactions on Vehicular Technology.

[68]  K. J. Ray Liu,et al.  A near-optimal reinforcement learning scheme for energy efficient point-to-point wireless communications , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[69]  Jim Dowling,et al.  SAMPLE: Statistical Network Link Modelling in an On-Demand Probabilistic Routing Protocol for Ad Hoc Networks , 2005, Second Annual Conference on Wireless On-demand Network Systems and Services.

[70]  Gerhard Fohler,et al.  Probabilistic Routing for Wireless Sensor Networks , 2008 .

[71]  Chen-Khong Tham,et al.  Adaptive QoS provisioning in wireless ad hoc networks: a semi-MDP approach , 2005, IEEE Wireless Communications and Networking Conference, 2005.

[72]  Xianfu Chen,et al.  Inter-cluster connection in cognitive wireless mesh networks based on intelligent network coding , 2009, 2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications.

[73]  J.-C. Renaud,et al.  Multi-Agent Systems on Sensor Networks: A Distributed Reinforcement Learning Approach , 2005, 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[74]  D. Zeghlache,et al.  An intelligent approach to partition multimedia traffic onto multiple radio access networks , 2002, Proceedings IEEE 56th Vehicular Technology Conference.

[75]  Jingtao Li,et al.  Heuristic and Distributed QoS Route Discovery for Mobile Ad hoc Networks , 2005, The Fifth International Conference on Computer and Information Technology (CIT'05).

[76]  Pawel A. Dmochowski,et al.  Analysis and implementation of reinforcement learning on a GNU Radio cognitive radio platform , 2010, 2010 Proceedings of the Fifth International Conference on Cognitive Radio Oriented Wireless Networks and Communications.

[77]  W. Usaha A reinforcement learning approach for path discovery in MANETs with path caching strategy , 2004, 1st International Symposium onWireless Communication Systems, 2004..

[78]  Kok-Lim Alvin Yau,et al.  Achieving context awareness and intelligence in Cognitive Radio Networks using reinforcement learning for multi-state applications , 2010 .

[79]  Xiaoming Zhou,et al.  Adaptive hierarchical resource management for satellite channel in hybrid MANET-satellite-Internet network , 2004, IEEE 60th Vehicular Technology Conference, 2004. VTC2004-Fall. 2004.

[80]  Kok-Lim Alvin Yau,et al.  Achieving Context Awareness and Intelligence in Cognitive Radio Networks using Reinforcement Learning for Stateful Applications , 2010 .

[81]  El-Sayed M. El-Alfy,et al.  A model-based Q-learning scheme for wireless channel allocation with prioritized handoff , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[82]  Abdelhamid Mellouk,et al.  Dynamic routing optimization based on real time Adaptive Delay Estimation for wireless networks , 2010, The IEEE symposium on Computers and Communications.

[83]  Oriol Sallent,et al.  Distributed spectrum management based on reinforcement learning , 2009, 2009 4th International Conference on Cognitive Radio Oriented Wireless Networks and Communications.

[84]  Victor C. M. Leung,et al.  A new QoS provisioning method for adaptive multimedia in cellular wireless networks , 2004, IEEE INFOCOM 2004.

[85]  Mihaela van der Schaar,et al.  Learning to Compete for Resources in Wireless Stochastic Games , 2009, IEEE Transactions on Vehicular Technology.

[86]  Prashant J. Shenoy,et al.  An Adaptive Link Layer for Range Diversity in Multi-Radio Mobile Sensor Networks , 2009, IEEE INFOCOM 2009.

[87]  Oriol Sallent,et al.  Reinforcement Learning for Active Queue Management in Mobile All-IP Networks , 2007, 2007 IEEE 18th International Symposium on Personal, Indoor and Mobile Radio Communications.

[88]  Victor C. M. Leung,et al.  A Mobile-Directory Approach to Service Discovery in Wireless Ad Hoc Networks , 2008, IEEE Transactions on Mobile Computing.

[89]  Chen-Khong Tham,et al.  Coordinated Rate Control in Wireless Sensor Network , 2006, 2006 10th IEEE Singapore International Conference on Communication Systems.

[90]  Rong Yu,et al.  Packet Scheduling in Broadband Wireless Networks Using Neuro-Dynamic Programming , 2007, 2007 IEEE 65th Vehicular Technology Conference - VTC2007-Spring.

[91]  Ying Zhang,et al.  Constrained flooding: a robust and efficient routing framework for wireless sensor networks , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[92]  Erol Gelenbe,et al.  Can Routing Oscillations be Good? The Benefits of Route-switching in Self-aware Networks , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[93]  Yuan Xue,et al.  Autonomic Joint Session Scheduling Strategies for Heterogeneous Wireless Networks , 2008, 2008 IEEE Wireless Communications and Networking Conference.

[94]  V. Srinivasan,et al.  Achieving Coverage through Distributed Reinforcement Learning in Wireless Sensor Networks , 2007, 2007 3rd International Conference on Intelligent Sensors, Sensor Networks and Information.

[95]  Kok-Lim Alvin Yau,et al.  Context-awareness and intelligence in Distributed Cognitive Radio Networks: A Reinforcement Learning approach , 2010, 2010 Australian Communications Theory Workshop (AusCTW).

[96]  Ana Galindo-Serrano,et al.  Decentralized Q-Learning for Aggregated Interference Control in Completely and Partially Observable Cognitive Radio Networks , 2010, 2010 7th IEEE Consumer Communications and Networking Conference.