A Distributed Reinforcement Learning Scheme for Network Routing

In this paper we describe a self-adjusting algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used to keep accurate statistics at each node on which routing policies lead to minimal delivery times. In simple experiments involving a 36-node, irregularly connected network, this learning approach proves superior to a nonadaptive algorithm based on precomputed shortest paths.