Constrained Deep Reinforcement Learning for Smart Load Balancing

In this paper, we explore the use of an actor-critic architecture for Deep Reinforcement Learning (DRL) to improve load balancing beyond traditional algorithms. Some centralized Reinforcement Learning (RL) algorithms have targeted in the reward function expression the Quality of Experience (QoE) for video flows, but this requires access to clients, or the Maximum Link Utilization (MLU) for other types of flows. In our approach, we tune the actor-critic algorithm to only leverage on QoS parameters in order to load balance traffic in the network and maximize the QoE experienced by the users. This avoids having to collect observations and performance measurements from client applications, as it only focuses on network metrics that can be easily measured. We explore both centralized and distributed solutions to assess the feasibility of the proposed smart load balancing solutions. We compare them to ECMP, QoE-based reward methods, and RILNET that uses an underlying DDPG optimization approach. The proposed algorithms are shown to outperform previous approaches.