Routing is a relevant issue for maintaining good performance and successfully operating in a network. We focused in this paper on neuro-dynamic programming to construct dynamic state-dependent routing policies which offer several advantages, including a stochastic modelization of the environment, learning and evaluation are assumed to happen continually, multi-paths routing and minimizing state overhead. This paper describe an adaptive algorithm for high speed irregular packet routing using reinforcement learning called N Q-routing Optimal Shortest Paths (NQOSP). In contrast with other algorithms that are also based on Reinforcement Learning methods, NQOSP is based on a multi-paths routing technique combined with the Q-Routing algorithm. In this case, the exploration space is limited to N-Optimal non loop paths in term of hops number (number of routers in a path) leading to a substantial reduction of convergence time. We propose here a framework to describe our algorithm and focus to improve scalability, robustness of our approach. We also integrate a module to compute dynamically a probability in order to better distribute traffic on best paths. The performance of NQOSP is evaluated experimentally with OPNET simulator for different levels of traffic’s load and compared to standard shortest path and Q-routing algorithms on large interconnected network. Our approach prove superior to a classical algorithms and is able to route efficiently in large networks even when critical aspects, such as the link broken network, are allowed to vary dynamically.
[1]
David Eppstein,et al.
Finding the k shortest paths
,
1994,
Proceedings 35th Annual Symposium on Foundations of Computer Science.
[2]
Gary Scott Malkin,et al.
RIP Version 2 Carrying Additional Information
,
1993,
RFC.
[3]
Eric S. Crawley,et al.
A Framework for QoS-based Routing in the Internet
,
1998,
RFC.
[4]
Michael L. Littman,et al.
Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach
,
1993,
NIPS.
[5]
A. Ozdaglar,et al.
Optimal Solution of Integer Multicommodity Flow Problems With Application in Optical Networks
,
2004
.
[6]
Richard S. Sutton,et al.
Reinforcement Learning
,
1992,
Handbook of Machine Learning.
[7]
Risto Miikkulainen,et al.
On-Line Adaptation of a Signal Predistorter through Dual Reinforcement Learning
,
1996,
ICML.
[8]
John Moy,et al.
OSPF Version 2
,
1998,
RFC.
[9]
Craig Partridge,et al.
A Proposed Flow Specification
,
1992,
RFC.
[10]
Andrew G. Barto,et al.
Reinforcement learning
,
1998
.
[11]
Devika Subramanian,et al.
Ants and Reinforcement Learning: A Case Study in Routing in Dynamic Networks
,
1997,
IJCAI.
[12]
Said Hoceini,et al.
K-Shortest Paths Q-Routing: A New QoS Routing Algorithm in Telecommunication Networks
,
2005,
ICN.