Distributed Multi-Hop Traffic Engineering via Stochastic Policy Gradient Reinforcement Learning

Multi-hop networks (e.g., mesh, ad-hoc, and sensor networks) are important and cost-efficient communication backbones. Over the last few years wireless data traffic has drastically increased due to the changes in the way today's society creates, shares, and consumes information. This demands the efficient and intelligent utilization of limited network resources to optimize network performance. Traffic engineering (TE) optimizes network performance and enables optimal forwarding and routing rules to meet the quality of service (QoS) requirements for a large volume of traffic flows. This paper proposes a distributed model-free TE solution based on stochastic policy gradient reinforcement learning (RL), which aims to learn a stochastic routing policy for each router so that each router can send a packet to the next-hop router according to the learned optimal probability. The proposed policy-gradient solution naturally leads to multi-path TE strategies, which can effectively distribute the high traffic loads among all available routing paths to minimize the E2E delay. Moreover, a distributed software-defined networking architecture is proposed, which enables the fast prototyping of the proposed multi-agent actor-critic TE (MA-AC TE) algorithm and in-nature supports automated TE through multi-agent RL learning.

[1]  Bernhard Walke,et al.  IEEE 802.11s: The WLAN Mesh Standard , 2010, IEEE Wireless Communications.

[2]  Maxim V. Kavalerov,et al.  Preventing instability in full echo Q-routing with adaptive learning rates , 2017, 2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus).

[3]  Hyuck M. Kwon,et al.  Utility-Optimal Wireless Routing in the Presence of Heavy Tails , 2018, IEEE Transactions on Vehicular Technology.

[4]  Yuliya Shilova,et al.  Adaptive Q-Routing with random echo and route memory , 2017, 2017 20th Conference of Open Innovations Association (FRUCT).

[5]  Leonid Peshkin,et al.  Reinforcement learning for adaptive routing , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[6]  Daniel Pérez Palomar,et al.  A tutorial on decomposition methods for network utility maximization , 2006, IEEE Journal on Selected Areas in Communications.

[7]  Yuliya Shilova,et al.  Full Echo Q-routing with adaptive learning rates: A reinforcement learning approach to network routing , 2016, 2016 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (EIConRusNW).

[8]  Nei Kato,et al.  State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems , 2017, IEEE Communications Surveys & Tutorials.

[9]  Pu Wang,et al.  Delay-Optimal Traffic Engineering through Multi-agent Reinforcement Learning , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[10]  Murali S. Kodialam,et al.  Traffic engineering in software defined networks , 2013, 2013 Proceedings IEEE INFOCOM.

[11]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[12]  Chi Harold Liu,et al.  Experience-driven Networking: A Deep Reinforcement Learning based Approach , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[13]  Albert Cabellos-Aparicio,et al.  A Deep-Reinforcement Learning Approach for Software-Defined Networking Routing Optimization , 2017, ArXiv.

[14]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[15]  Ian F. Akyildiz,et al.  QoS-Aware Adaptive Routing in Multi-layer Hierarchical Software Defined Networks: A Reinforcement Learning Approach , 2016, 2016 IEEE International Conference on Services Computing (SCC).

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.