Reinforcement learning with function approximation for cooperative navigation tasks

In this paper, we propose a reinforcement learning approach to address multi-robot cooperative navigation tasks in infinite settings. We propose an algorithm to simultaneously address the problems of learning and coordination in multi-robot problems. The proposed algorithm extends those existing in the literature, allowing to address simultaneous learning and coordination in problems with an infinite state-space. We also present the results obtained in several test scenarios featuring multi-robot navigation situations with partial observability.

[1]  Matthijs T. J. Spaan,et al.  An approach to noncommunicative multiagent coordination in continuous domains , 2002 .

[2]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[4]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[5]  Nicholas V. Findler,et al.  Social Structures and the Problem of Coordination in Intelligent Agent Societies , 2000 .

[6]  Lennart Ljung,et al.  Analysis of recursive stochastic algorithms , 1977 .

[7]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[8]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[9]  R. Bellman Dynamic programming. , 1957, Science.

[10]  Dieter Fox,et al.  Markov localization - a probabilistic framework for mobile robot localization and navigation , 1998 .

[11]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[12]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[13]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[14]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[15]  Felix A. Fischer,et al.  Hierarchical reinforcement learning in communication-mediated multiagent coordination , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[16]  Francisco S. Melo,et al.  Emerging coordination in infinite team Markov games , 2008, AAMAS.

[17]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[18]  Francisco S. Melo,et al.  LEARNING TO COORDINATE IN TOPOLOGICAL NAVIGATION TASKS , 2007 .

[19]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Francisco S. Melo,et al.  Q -Learning with Linear Function Approximation , 2007, COLT.

[22]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[23]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[24]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[25]  M. Pelletier On the almost sure asymptotic behaviour of stochastic algorithms , 1998 .

[26]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .