Network load balancing strategy based on supervised reinforcement learning with shaping rewards

This paper proposes supervised reinforcement learning (SRL) algorithm for network load balancing strategy with shaping rewards. We define the index of router as state set; design additional distance improving reward and load balancing reward to construct the supervisor; adopt epsilon greedy algorithm as the action selecting strategy and prove that the state transmission is a deterministic matrix. Besides, we carry out the simulation work which demonstrates that by maximizing the sum of discounted rewards, SRL is an effective controller for network load balancing strategy; each router can apply this algorithm to calculate the optimal path to other routers with network load balancing requirement.

[1]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2]  Kristina Lerman,et al.  Resource allocation in the grid using reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  Martin Heusse,et al.  Adaptive Agent-Driven Routing and Load Balancing in Communication Networks , 1998, Adv. Complex Syst..

[5]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[6]  Dongbin Zhao,et al.  Reinforcement learning for multi-agent patrol policy , 2010, 9th IEEE International Conference on Cognitive Informatics (ICCI'10).

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Jennie Si,et al.  Supervised ActorCritic Reinforcement Learning , 2004 .

[9]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Michael T. Rosenstein,et al.  Supervised Actor‐Critic Reinforcement Learning , 2012 .

[11]  Dongbin Zhao,et al.  Adaptive Cruise Control Based on Reinforcement Leaning with Shaping Rewards , 2011, Journal of Advanced Computational Intelligence and Intelligent Informatics.

[12]  Dongbin Zhao,et al.  Self-teaching adaptive dynamic programming for Gomoku , 2012, Neurocomputing.

[13]  Jianqiang Yi,et al.  Genetic Algorithm-Based Fuzzy Controller To Avoid Network Congestion , 2009, Intell. Autom. Soft Comput..

[14]  Dong Sun,et al.  Distributed neural network-based policy gradient reinforcement learning for multi-robot formations , 2008, 2008 International Conference on Information and Automation.

[15]  Warren B. Powell,et al.  Reinforcement Learning and Its Relationship to Supervised Learning , 2004 .