This paper describes and evaluates how confidence values can be used to improve the quality of exploration in QRouting for adaptive packet routing in communication networks. In Q-Routing each node in the network has a routing decision maker that adapts, on-line, to learn routing policies that can sustain high network loads and have low average packet delivery time. These decision makers maintain their view of the network in terms of Q values which are updated as the routing takes place. In Confidence based Q-Routing (CQ-Routing), the improved implementation of Q-Routing with confidence values, each Q value is attached with a confidence (C value) which is a measure of how closely the corresponding Q value represents the current state of the network. While the learning rate in Q-Routing is a constant, the learning rate in CQ-Routing is computed as a function of confidence values of the old and estimated Q values for each update. If either the old Q value has a low confidence or the estimated Q value has a high confidence, the learning rate is high. The quality of exploration is improved in CQ-Routing as a result of this variable learning rate. Experiments over several network topologies have shown that at low and medium loads, CQ-Routing learns the adequate routing policies significantly faster than Q-Routing, and at high loads, it learns routing policies that are significantly better than those learned by Q-Routing in terms of average packet delivery time. CQ-Routing is able to sustain higher network loads than Q-Routing, non-adaptive shortest-path routing and adaptive Bellman-Ford Routing. Finally, CQ-Routing was found to adapt significantly faster than Q-Routing to changes in network topology.
[1]
Dit-Yan Yeung,et al.
Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control
,
1995,
NIPS.
[2]
Shailesh Kumar and Risto Miikkulainen.
Dual Reinforcement Q-Routing: An On-Line Adaptive Routing Algorithm
,
1997
.
[3]
Michael L. Littman,et al.
Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach
,
1993,
NIPS.
[4]
Shailesh Kumar,et al.
Confidence based Dual Reinforcement Q-Routing: an On-line Adaptive NetworkRouting Algorithm
,
1998
.
[5]
Thomas G. Dietterich.
What is machine learning?
,
2020,
Archives of Disease in Childhood.