论文信息 - An Adaptive Network Routing Strategy with Temporal Differences

An Adaptive Network Routing Strategy with Temporal Differences

This paper describes the TD-Routing, an adaptive algorithm for packet routing, based on the Temporal Differences TD(λ) method, and compares its performance with other routing strategies: Shortest Path Routing, Bellman-Ford and the Q-Routing. High and low network traffic conditions are considered. In contrast with other algorithms that are also based on Reinforcement Learning (RL), the TD-Routing is able to discover good policies for situations that present a reduction in network traffic. The performance of the proposed algorithm was evaluated within a benchmark network configuration of 16 nodes with different traffic conditions in different topologies. The simulations demonstrate that the TD-Routing outperforms other RL-based algorithms in terms of learning speed and adaptability.

Marley M. B. R. Vellasco | Marco Aurélio Cavalcanti Pacheco | Yván J. Túpac Valdivia | M. Pacheco | M. Vellasco

[1] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[3] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[4] Richard Bellman,et al. ON A ROUTING PROBLEM , 1958 .

[5] R. Bellman. Dynamic programming. , 1957, Science.

[6] Michael L. Littman,et al. A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .

[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[8] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[9] Michael L. Littman,et al. Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[10] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[11] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.