Flow Based Routing for Irregular Traffic using Reinforcement Learning Approach in Dynamic Networks

Routing is a relevant issue for maintaining good performance and successfully operating in a network. We focused in this paper on neuro-dynamic programming to construct dynamic state-dependent routing policies which offer several advantages, including a stochastic modelization of the environment, learning and evaluation are assumed to happen continually, multi-paths routing and minimizing state overhead. This paper describe an adaptive algorithm for high speed irregular packet routing using reinforcement learning called N Q-routing Optimal Shortest Paths (NQOSP). In contrast with other algorithms that are also based on Reinforcement Learning methods, NQOSP is based on a multi-paths routing technique combined with the Q-Routing algorithm. In this case, the exploration space is limited to N-Optimal non loop paths in term of hops number (number of routers in a path) leading to a substantial reduction of convergence time. We propose here a framework to describe our algorithm and focus to improve scalability, robustness of our approach. We also integrate a module to compute dynamically a probability in order to better distribute traffic on best paths. The performance of NQOSP is evaluated experimentally with OPNET simulator for different levels of traffic’s load and compared to standard shortest path and Q-routing algorithms on large interconnected network. Our approach prove superior to a classical algorithms and is able to route efficiently in large networks even when critical aspects, such as the link broken network, are allowed to vary dynamically.