Using feedback in collaborative reinforcement learning to adaptively optimize MANET routing

Designers face many system optimization problems when building distributed systems. Traditionally, designers have relied on optimization techniques that require either prior knowledge or centrally managed runtime knowledge of the system's environment, but such techniques are not viable in dynamic networks where topology, resource, and node availability are subject to frequent and unpredictable change. To address this problem, we propose collaborative reinforcement learning (CRL) as a technique that enables groups of reinforcement learning agents to solve system optimization problems online in dynamic, decentralized networks. We evaluate an implementation of CRL in a routing protocol for mobile ad hoc networks, called SAMPLE. Simulation results show how feedback in the selection of links by routing agents enables SAMPLE to adapt and optimize its routing behavior to varying network conditions and properties, resulting in optimization of network throughput. In the experiments, SAMPLE displays emergent properties such as traffic flows that exploit stable routes and reroute around areas of wireless interference or congestion. SAMPLE is an example of a complex adaptive distributed system.

[1]  Martin Fleury,et al.  Modeler and Ns-2 : Comparing the Accuracy Of Network Simulators for Packet-Level Analysis using a Network Testbed , 2003 .

[2]  Hein Meling,et al.  Toward Self-organizing, Self-repairing and Resilient Distributed Systems , 2003, Future Directions in Distributed Computing.

[3]  Michael L. Littman,et al.  A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .

[4]  David A. Maltz,et al.  DSR: the dynamic source routing protocol for multihop wireless ad hoc networks , 2001 .

[5]  Marco Dorigo,et al.  The ant colony optimization meta-heuristic , 1999 .

[6]  Vinny Cahill,et al.  Understanding link quality in 802.11 mobile ad hoc networks , 2004, IEEE Internet Computing.

[7]  Jim Dowling,et al.  SAMPLE: An On-Demand Probabilistic Routing Protocol for Ad-hoc Networks , 2004 .

[8]  John A. Zinky,et al.  Building auto-adaptive distributed applications: the QuO-APOD experience , 2003, 23rd International Conference on Distributed Computing Systems Workshops, 2003. Proceedings..

[9]  Charles E. Perkins,et al.  Ad hoc On-Demand Distance Vector (AODV) Routing , 2001, RFC.

[10]  Christian F. Tschudin,et al.  The gray zone problem in IEEE 802.11b based ad hoc networks , 2002, MOCO.

[11]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[12]  G. Di Caro,et al.  Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[13]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14]  Hein Meling,et al.  Messor: Load-Balancing through a Swarm of Autonomous Agents , 2002, AP2PC.

[15]  Wilfried Brauer,et al.  Fuzzy Model-Based Reinforcement Learning , 2002, Advances in Computational Intelligence and Learning.

[16]  Eoin Curran,et al.  SWARM: Cooperative Reinforcement Learning for Routing in Ad-hoc Networks , 2003 .

[17]  David A. Maltz,et al.  A performance comparison of multi-hop wireless ad hoc network routing protocols , 1998, MobiCom '98.

[18]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[19]  R. Bellman Dynamic programming. , 1957, Science.

[20]  Peter Stone TPOT-RL Applied to Network Routing , 2000, ICML.

[21]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[22]  Jim Dowling,et al.  Collaborative reinforcement learning of autonomic behaviour , 2004 .

[23]  Robert Tappan Morris,et al.  Capacity of Ad Hoc wireless networks , 2001, MobiCom '01.

[24]  A. M. Abdullah,et al.  Wireless lan medium access control (mac) and physical layer (phy) specifications , 1997 .

[25]  Ilya Prigogine,et al.  Order out of chaos , 1984 .

[26]  G. Flake The Computational Beauty of Nature , 1998 .

[27]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[28]  Vasant Honavar,et al.  Autonomous agents for coordinated distributed parameterized heuristic routing in large dynamic communication networks , 2001, J. Syst. Softw..

[29]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[30]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[31]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[32]  Eduardo F. Morales,et al.  A New Distributed Reinforcement Learning Algorithm for Multiple Objective Optimization Problems , 2000, IBERAMIA-SBIA.