Efficient Distributed Reinforcement Learning through Agreement

Distributed robotic systems can benefit from automatic controller design and online adaptation by reinforcement learning (RL), but often suffer from the limitations of partial observability. In this paper, we address the twin problems of limited local experience and locally observed but not necessarily telling reward signals encountered in such systems.We combine direct search in policy space with an agreement algorithm to efficiently exchange local rewards and experience among agents. We demonstrate improved learning ability on the locomotion problem for self-reconfiguring modular robots in simulation, and show that a fully distributed implementation can learn good policies just as fast as the centralized implementation. Our results suggest that prior work on centralized RL algorithms for modular robots may be made effective in practice through the application of agreement algorithms. This approach could be fruitful in many cooperative situations, whenever robots need to learn similar behaviors, but have access only to local information.

[1]  Randy A. Freeman,et al.  Decentralized Environmental Modeling by Mobile Sensor Networks , 2008, IEEE Transactions on Robotics.

[2]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[3]  Leslie Pack Kaelbling,et al.  Learning distributed control for modular robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[4]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[5]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[6]  Benjamin Van Roy,et al.  Distributed Optimization in Adaptive Networks , 2003, NIPS.

[7]  Leslie Pack Kaelbling,et al.  Automated Design of Adaptive Controllers for Modular Robots using Reinforcement Learning , 2008, Int. J. Robotics Res..

[8]  Leslie Pack Kaelbling,et al.  Reinforcement Learning by Policy Search , 2002 .

[9]  Leslie Pack Kaelbling,et al.  All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[10]  Andrew W. Moore,et al.  Distributed Value Functions , 1999, ICML.

[11]  Benjamin Van Roy,et al.  Consensus Propagation , 2005, IEEE Transactions on Information Theory.

[12]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[13]  L. E. ParkerCenter Learning in Large Cooperative Multi-Robot Domains , 2001 .

[14]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[15]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.