Toward Packet Routing with Fully-distributed Multi-agent Deep Reinforcement Learning

Packet routing is one of the fundamental problems in computer networks in which a router determines the next-hop of each packet in the queue to get it as quickly as possible to its destination. Reinforcement learning has been introduced to design the autonomous packet routing policy namely Q-routing only using local information available to each router. However, the curse of dimensionality of Q-routing prohibits the more comprehensive representation of dynamic network states, thus limiting the potential benefit of reinforcement learning. Inspired by recent success of deep reinforcement learning (DRL), we embed deep neural networks in multi-agent Q-routing. Each router possesses an independent neural network that is trained without communicating with its neighbors and makes decision locally. Two multi-agent DRL-enabled routing algorithms are proposed: one simply replaces Q-table of vanilla Q-routing by a deep neural network, and the other further employs extra information including the past actions and the destinations of non-head of line packets. Our simulation manifests that the direct substitution of Q-table by a deep neural network may not yield minimal delivery delays because the neural network does not learn more from the same input. When more information is utilized, adaptive routing policy can converge and significantly reduce the packet delivery time.

[1]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[2]  Charles E. Perkins,et al.  Highly dynamic Destination-Sequenced Distance-Vector routing (DSDV) for mobile computers , 1994, SIGCOMM.

[3]  Albert Cabellos-Aparicio,et al.  A Deep-Reinforcement Learning Approach for Software-Defined Networking Routing Optimization , 2017, ArXiv.

[4]  Valeriy Vyatkin,et al.  Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system , 2019, Future Gener. Comput. Syst..

[5]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[6]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[7]  Guangdeng Zong,et al.  Adaptive Neural Hierarchical Sliding Mode Control of Nonstrict-Feedback Nonlinear Systems and an Application to Electronic Circuits , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[8]  Matthew Roughan,et al.  The Internet Topology Zoo , 2011, IEEE Journal on Selected Areas in Communications.

[9]  Ya-Jun Pan,et al.  Consensus of Linear Multiagent Systems With Input-Based Triggering Condition , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  Kok-Lim Alvin Yau,et al.  Application of reinforcement learning to routing in distributed wireless networks: a review , 2013, Artificial Intelligence Review.

[11]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[12]  Mou Chen,et al.  Constrained Control Allocation for Overactuated Aircraft Using a Neurodynamic Model , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[13]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[14]  Mihaela van der Schaar,et al.  Autonomic and Distributed Joint Routing and Power Control for Delay-Sensitive Applications in Multi-Hop Wireless Networks , 2011, IEEE Transactions on Wireless Communications.

[15]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[17]  Charles E. Perkins,et al.  Ad hoc On-Demand Distance Vector (AODV) Routing , 2001, RFC.

[18]  Huaicheng Yan,et al.  Errata: Distributed Event-Triggered Adaptive Control for Cooperative Output Regulation of Heterogeneous Multi-Agent Systems Under Switching Topology , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[20]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[21]  Pablo Hernandez-Leal,et al.  A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[24]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[25]  Thomas R. Henderson,et al.  Network Simulations with the ns-3 Simulator , 2008 .

[26]  Leandros Tassiulas,et al.  Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks , 1990, 29th IEEE Conference on Decision and Control.

[27]  Yang Yang,et al.  Reinforcement learning based spectrum-aware routing in multi-hop cognitive radio networks , 2009, 2009 4th International Conference on Cognitive Radio Oriented Wireless Networks and Communications.

[28]  Gang Feng,et al.  Robust cooperative output regulation of multi-agent systems via adaptive event-triggered control , 2019, Autom..

[29]  Ya-Jun Pan,et al.  Event-Triggered Pinning Control for Consensus of Multiagent Systems With Quantized Information , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[30]  Shailesh Kumar and Risto Miikkulainen Dual Reinforcement Q-Routing: An On-Line Adaptive Routing Algorithm , 1997 .

[31]  Dafna Shahaf,et al.  Learning To Route with Deep RL , 2017 .

[32]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[33]  Dit-Yan Yeung,et al.  Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Jing Wang,et al.  A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs , 2017, 2017 IEEE International Conference on Communications (ICC).

[36]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[37]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[38]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[39]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.