Efficient routing of information packets in dynamically changing communication networks requires that as the load levels, traffic patterns and topology of the network change, the routing policy also adapts. Making globally optimal routing decisions would require a central observer/controller with complete information about the state of all nodes and links in the network, which is not realistic. Therefore, the routing decisions must be made locally by individual nodes (routers) using only local routing information. The routing information at a node could be estimates of packet delivery time to other nodes via its neighbors or estimates of queue lengths of other nodes in the network. An adaptive routing algorithm should efficiently explore and update routing information available at different nodes as it routes packets. It should continuously evolve efficient routing policies with minimum overhead on network resources. In this thesis, an on-line adaptive network routing algorithm called {\sc Confidence-based Dual Reinforcement Q-Routing} ({\sc CDRQ-routing}), based on the Q learning framework, is proposed and evaluated. In this framework, the routing information at individual nodes is maintained as Q value estimates of how long it will take to send a packet to any particular destination via each of the node''s neighbors. These Q values are updated through exploration as the packets are transmitted. The main contribution of this work is the faster adaptation and the improved quality of routing policies over the {Q}-Routing. The improvement is based on two ideas. First, the quality of exploration is improved by including a confidence measure with each Q value representing how reliable the Q value is. The learning rate is a function of these confidence values. Secondly, the quantity of exploration is increased by including backward exploration into Q learning. As a packet hops from one node to another, it not only updates a Q value in the sending node (forward exploration similar to {Q}-Routing), but also updates a Q value in the receiving node using the information appended to the packet when it is sent out (backward exploration). Thus two Q value updates per packet hop occur in {CDRQ}-Routing as against only one in {\sc Q-routing}. Certain properties of forward and backward exploration that form the basis of these update rules are stated and proved in this work. Experiments over several network topologies, including a 36 node irregular grid and 128 node 7-D hypercube, indicate that the improvement in quality and increase in quantity of exploration contribute in complementary ways to the performance of the overall routing algorithm. {CDRQ}-Routing was able to learn optimal shortest path routing at low loads and efficient routing policies at medium loads almost twice as fast as {Q}-Routing. At high load levels, the routing policy learned by {CDRQ}-Routing was twice as good as that learned by {Q}-Routing in terms of average packet delivery time. {CDRQ}-
[1]
Edsger W. Dijkstra,et al.
A note on two problems in connexion with graphs
,
1959,
Numerische Mathematik.
[2]
Stephen J. Garland,et al.
Algorithm 97: Shortest path
,
1962,
Commun. ACM.
[3]
Richard S. Sutton,et al.
Temporal credit assignment in reinforcement learning
,
1984
.
[4]
Bala Rajagopalan,et al.
A new responsive distributed shortest-path rounting algorithm
,
1989,
SIGCOMM '89.
[5]
J. J. Garcia-Luna-Aceves,et al.
A loop-free extended Bellman-Ford routing protocol without bouncing effect
,
1989,
SIGCOMM '89.
[6]
D. Sofge.
THE ROLE OF EXPLORATION IN LEARNING CONTROL
,
1992
.
[7]
Michael L. Littman,et al.
A Distributed Reinforcement Learning Scheme for Network Routing
,
1993
.
[8]
Michael L. Littman,et al.
Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach
,
1993,
NIPS.
[9]
Christian Huitema,et al.
Routing in the Internet
,
1995
.
[10]
Richard S. Sutton,et al.
Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse
,
1996
.
[11]
Dit-Yan Yeung,et al.
Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control
,
1995,
NIPS.
[12]
Risto Miikkulainen,et al.
On-Line Adaptation of a Signal Predistorter through Dual Reinforcement Learning
,
1996,
ICML.
[13]
Shailesh Kumar and Risto Miikkulainen.
Dual Reinforcement Q-Routing: An On-Line Adaptive Routing Algorithm
,
1997
.