Scalable Deep Reinforcement Learning for Routing and Spectrum Access in Physical Layer

This paper proposes a novel scalable reinforcement learning approach for simultaneous routing and spectrum access in wireless ad-hoc networks. In most previous works on reinforcement learning for network optimization, the network topology is assumed to be fixed, and a different agent is trained for each transmission node—this limits scalability and generalizability. Further, routing and spectrum access are typically treated as separate tasks. Moreover, the optimization objective is usually a cumulative metric along the route, e.g., number of hops or delay. In this paper, we account for the physical-layer signal-to-interference-plus-noise ratio (SINR) in a wireless network and further show that bottleneck objective such as the minimum SINR along the route can also be optimized effectively using reinforcement learning. Specifically, we propose a scalable approach in which a single agent is associated with each flow and makes routing and spectrum access decisions as it moves along the frontier nodes. The agent is trained according to the physical-layer characteristics of the environment using a novel rewarding scheme based on the Monte Carlo estimation of the future bottleneck SINR. It learns to avoid interference by intelligently making joint routing and spectrum allocation decisions based on the geographical location information of the neighbouring nodes.

[1]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2]  Soung Chang Liew,et al.  Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks , 2017, 2018 IEEE International Conference on Communications (ICC).

[3]  Dit-Yan Yeung,et al.  Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.

[4]  Ying-Chang Liang,et al.  Deep Reinforcement Learning for Multi-Agent Power Control in Heterogeneous Networks , 2020, IEEE Transactions on Wireless Communications.

[5]  Charles E. Perkins,et al.  Highly dynamic Destination-Sequenced Distance-Vector routing (DSDV) for mobile computers , 1994, SIGCOMM.

[6]  Mohammad S. Obaidat,et al.  LA-MHR: Learning Automata Based Multilevel Heterogeneous Routing for Opportunistic Shared Spectrum Access to Enhance Lifetime of WSN , 2019, IEEE Systems Journal.

[7]  Charles E. Perkins,et al.  Ad-hoc on-demand distance vector routing , 1999, Proceedings WMCSA'99. Second IEEE Workshop on Mobile Computing Systems and Applications.

[8]  Klara Nahrstedt,et al.  Distributed quality-of-service routing in ad hoc networks , 1999, IEEE J. Sel. Areas Commun..

[9]  Ananthram Swami,et al.  Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework , 2007, IEEE Journal on Selected Areas in Communications.

[10]  Susanna Mosleh,et al.  Dynamic Spectrum Access with Reinforcement Learning for Unlicensed Access in 5G and Beyond , 2020, 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring).

[11]  Rahul Desai,et al.  Cooperative reinforcement learning approach for routing in ad hoc networks , 2015, 2015 International Conference on Pervasive Computing (ICPC).

[12]  Chien-Chung Shen,et al.  A novel layered graph model for topology formation and routing in dynamic spectrum access networks , 2005, First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005..

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Rajarathnam Chandramouli,et al.  Dynamic spectrum access in open spectrum wireless networks , 2006, IEEE Journal on Selected Areas in Communications.

[15]  David Tse,et al.  Mobility increases the capacity of ad hoc wireless networks , 2002, TNET.

[16]  Dimitri P. Bertsekas,et al.  Distributed Algorithms for Generating Loop-Free Routes in Networks with Frequently Changing Topology , 1981, IEEE Trans. Commun..

[17]  R. Bellman A Markovian Decision Process , 1957 .

[18]  Karim Faez,et al.  Signal Strength Based Reliability: A Novel Routing Metric in MANETs , 2010, 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing.

[19]  Wei Cui,et al.  Spatial Deep Learning for Wireless Scheduling , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[20]  D. Turgay Altilar,et al.  Self adaptive routing for dynamic spectrum access in cognitive radio networks , 2013, J. Netw. Comput. Appl..

[21]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[22]  Alamelu Nachiappan,et al.  Q-learning based adaptive QoS routing protocol for MANETs , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[23]  Volkan Isler,et al.  QoS and Jamming-Aware Wireless Networking Using Deep Reinforcement Learning , 2019, MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM).

[24]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[25]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[26]  J. J. Garcia-Luna-Aceves,et al.  An efficient routing protocol for wireless networks , 1996, Mob. Networks Appl..

[27]  Mznah Al-Rodhaan,et al.  Q-Routing in Cognitive Packet Network Routing Protocol for MANETs , 2014, IJCCI.

[28]  Michail G. Lagoudakis,et al.  Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[29]  Amy L. Murphy,et al.  A Feedback-Enhanced Learning Approach for Routing in WSN , 2011 .

[30]  Aura Ganz,et al.  Ad hoc QoS on-demand routing (AQOR) in mobile ad hoc networks , 2003, J. Parallel Distributed Comput..

[31]  L. Kaelbling,et al.  Mobilized ad-hoc networks: a reinforcement learning approach , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[32]  Adrião Duarte Dória Neto,et al.  Overhead-Controlled Routing in WSNs with Reinforcement Learning , 2012, IDEAL.

[33]  Vaduvur Bharghavan,et al.  Spine routing in ad hoc networks , 1998, Cluster Computing.

[34]  Ying-Chang Liang,et al.  Deep Reinforcement Learning-Based Modulation and Coding Scheme Selection in Cognitive Heterogeneous Networks , 2018, IEEE Transactions on Wireless Communications.

[35]  Ting Wang,et al.  Adaptive Routing for Sensor Networks using Reinforcement Learning , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[36]  SCALABLE REINFORCEMENT LEARNING FOR ROUTING IN AD-HOC NETWORKS BASED ON PHYSICAL-LAYER ATTRIBUTES , 2021 .

[37]  V. Singh,et al.  A tailored Q- Learning for routing in wireless sensor networks , 2012, 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing.

[38]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[39]  Lijun Qian,et al.  Distributed energy efficient spectrum access in cognitive radio wireless ad hoc networks , 2009, IEEE Transactions on Wireless Communications.

[40]  Xuemin Shen,et al.  Spectrum-Aware Opportunistic Routing in Multi-Hop Cognitive Radio Networks , 2012, IEEE Journal on Selected Areas in Communications.

[41]  Petteri Nurmi,et al.  Reinforcement Learning for Routing in Ad Hoc Networks , 2007, 2007 5th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks and Workshops.

[42]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[43]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[44]  Leslie Pack Kaelbling,et al.  All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[45]  Wenbo Wang,et al.  Prediction-Based Spectrum Access Optimization in Cognitive Radio Networks , 2018, 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).

[46]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[47]  Wei Yu,et al.  Dual methods for nonconvex spectrum optimization of multicarrier systems , 2006, IEEE Transactions on Communications.

[48]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.