An Adaptive Network Routing Strategy with Temporal Differences

This paper describes the TD-Routing, an adaptive algorithm for packet routing, based on the Temporal Differences TD(λ) method, and compares its performance with other routing strategies: Shortest Path Routing, Bellman-Ford and the Q-Routing. High and low network traffic conditions are considered. In contrast with other algorithms that are also based on Reinforcement Learning (RL), the TD-Routing is able to discover good policies for situations that present a reduction in network traffic. The performance of the proposed algorithm was evaluated within a benchmark network configuration of 16 nodes with different traffic conditions in different topologies. The simulations demonstrate that the TD-Routing outperforms other RL-based algorithms in terms of learning speed and adaptability.