Reinforcement Learning in Distributed Domains: Beyond Team Games

Distributed search algorithms are crucial in dealing with large optimization problems, particularly when a centralized approach is not only impractical but infeasible. Many machine learning concepts have been applied to search algorithms in order to improve their effectiveness. In this article we present an algorithm that blends Reinforcement Learning (RL) and hill climbing directly, by using the RL signal to guide the exploration step of a hill climbing algorithm. We apply this algorithm to the domain of a constellations of communication satellites where the goal is to minimize the loss of importance weighted data. We introduce the concept of 'ghost' traffic, where correctly setting this traffic induces the satellites to act to optimize the world utility. Our results indicated that the bi-utility search introduced in this paper outperforms both traditional hill climbing algorithms and distributed RL approaches such as team games.

[1]  Andrew B. Kahng,et al.  A new adaptive multi-start technique for combinatorial global optimizations , 1994, Oper. Res. Lett..

[2]  Kagan Tumer,et al.  Adaptivity in agent-based routing for data networks , 1999, AGENTS '00.

[3]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[4]  Kagan Tumer,et al.  Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[5]  Kagan Tumer,et al.  Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.

[6]  Richard S. Sutton,et al.  Learning Instance-Independent Value Functions to Enhance Local Search , 1998, NIPS.

[7]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[8]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[9]  Kagan Tumer,et al.  An Introduction to Collective Intelligence , 1999, ArXiv.

[10]  S. Griffis EDITOR , 1997, Journal of Navigation.

[11]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[12]  Narsingh Deo,et al.  Shortest-path algorithms: Taxonomy and annotation , 1984, Networks.

[13]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[14]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[15]  R. Weiner Lecture Notes in Economics and Mathematical Systems , 1985 .

[16]  Kagan Tumer,et al.  Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.

[17]  R. Liiling Problem Independent Distributed Simulated Annealing and its Applications , 1993 .

[18]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[19]  O. Catoni Solving Scheduling Problems by Simulated Annealing , 1998 .

[20]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[22]  Kenneth Dean Boese,et al.  Models for iterative global optimization , 1996 .

[23]  C. McDiarmid SIMULATED ANNEALING AND BOLTZMANN MACHINES A Stochastic Approach to Combinatorial Optimization and Neural Computing , 1991 .

[24]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[25]  Richard Bellman,et al.  ON A ROUTING PROBLEM , 1958 .

[26]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[27]  Andrew W. Moore,et al.  Learning Evaluation Functions for Global Optimization and Boolean Satisfiability , 1998, AAAI/IAAI.

[28]  Michael L. Littman,et al.  A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .