A Generalized Load Balancing Policy With Multi-Teacher Reinforcement Learning

Although reinforcement learning (RL) shows advantages in cellular network load balancing, it suffers from a low generalization ability, preventing it from real-world applications. Specifically, if network traffic pattern changes, the learned RL policy cannot adapt accordingly, resulting in system performance degradation. To address this issue, we propose a Multi-teacher MOdel BAsed Reinforcement Learning algorithm (MOBA), which leverages multi-teacher knowledge distillation theory to learn a generalized load balancing policy for adapting the real-world traffic pattern changes. The key is that different teachers represent different traffic patterns, and can learn various system models. By distilling and transferring the teacher knowledge, the student network is able to learn a generalized system model that covers different traffic patterns and unseen situations. Moreover, to improve the robustness of multi-teacher knowledge transfer, we learn a set of student models and use an ensemble method to jointly predict system dynamics. Results show that, compared with state-of-the-art RL methods, MOBA improves the minimal throughput and total throughput of a cellular network by up to 28.6% and 23.2%. Results also show that MOBA improves the training efficiency by up to 64%.

[1]  Yi Tian Xu,et al.  Load Balancing for Communication Networks via Data-Efficient Deep Reinforcement Learning , 2021, 2021 IEEE Global Communications Conference (GLOBECOM).

[2]  Yi Tian Xu,et al.  Hierarchical Policy Learning for Hybrid Communication Load Balancing , 2021, ICC 2021 - IEEE International Conference on Communications.

[3]  Ming Gong,et al.  Reinforced Multi-Teacher Selection for Knowledge Distillation , 2020, AAAI.

[4]  Kuk-Jin Yoon,et al.  Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[6]  Byung Cheol Song,et al.  Graph-based Knowledge Distillation by Multi-head Attention Network , 2019, BMVC.

[7]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[8]  Pieter Abbeel,et al.  Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[9]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Dacheng Tao,et al.  Learning from Multiple Teacher Networks , 2017, KDD.

[11]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[13]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[14]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[15]  Samira Ebrahimi Kahou,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[16]  Riccardo Trivisonno,et al.  On Mobility Load Balancing for LTE Systems , 2010, 2010 IEEE 72nd Vehicular Technology Conference - Fall.

[17]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[18]  Wenlong Fu,et al.  Model-based reinforcement learning: A survey , 2018 .