Automated Design of Adaptive Controllers for Modular Robots using Reinforcement Learning

Designing distributed controllers for self-reconfiguring modular robots has been consistently challenging. We have developed a reinforcement learning approach which can be used both to automate controller design and to adapt robot behavior on-line. In this paper, we report on our study of reinforcement learning in the domain of self-reconfigurable modular robots: the underlying assumptions, the applicable algorithms and the issues of partial observability, large search spaces and local optima. We propose and validate experimentally in simulation a number of techniques designed to address these and other scalability issues that arise in applying machine learning to distributed systems such as modular robots. We discuss ways to make learning faster, more robust and amenable to on-line application by giving scaffolding to the learning agents in the form of policy representation, structured experience and additional information. With enough structure modular robots can run learning algorithms to both automate the generation of distributed controllers, and adapt to the changing environment and deliver on the self-organization promise with less interference from human designers, programmers and operators.

[1]  L. Penrose,et al.  Self-Reproducing Machines , 1959 .

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Hiroaki Kitano,et al.  RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[6]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[7]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[10]  Andrew W. Moore,et al.  Distributed Value Functions , 1999, ICML.

[11]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[12]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[13]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[14]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[15]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[16]  Learning in Large Cooperative Multi-Robot Domains , 2001 .

[17]  Jeremy Kubica,et al.  Collaborating With A Genetic Programming System To Generate Modular Robotic Code , 2002, GECCO.

[18]  Radhika Nagpal,et al.  Programmable self-assembly using biologically-inspired multiagent control , 2002, AAMAS '02.

[19]  Leslie Pack Kaelbling,et al.  Reinforcement Learning by Policy Search , 2002 .

[20]  Wenwei Yu,et al.  Using Interaction-Based Learning to Construct an Adaptive and Fault-Tolerant Multi-Link Floating Robot , 2002, DARS.

[21]  Leslie Pack Kaelbling,et al.  All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[22]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[23]  Vijay Kumar,et al.  Using policy gradient reinforcement learning on autonomous robot controllers , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[24]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[25]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[26]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[27]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[28]  Eiichi Yoshida,et al.  Distributed adaptive locomotion by a modular robotic system, M-TRAN II , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[29]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[30]  Zack J. Butler,et al.  Generic Decentralized Control for Lattice-Based Self-Reconfigurable Robots , 2004, Int. J. Robotics Res..

[31]  Martin C. Martin The Essential Dynamics Algorithm: Fast Policy Search In Continuous Worlds , 2004 .

[32]  Phil Husbands,et al.  Designed and Evolved Blueprints For Physical Self-Replicating Machines , 2004 .

[33]  Leslie Pack Kaelbling,et al.  Learning distributed control for modular robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[34]  D. Rus,et al.  Efficient Locomotion for a Self-Reconfiguring Robot , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[35]  Hod Lipson,et al.  Robotics: Self-reproducing machines , 2005, Nature.

[36]  H. Sebastian Seung,et al.  Learning to Walk in 20 Minutes , 2005 .

[37]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[38]  L. Kaelbling,et al.  On Scalability Issues in Reinforcement Learning for Self-Reconfiguring Modular Robots , 2006 .

[39]  Daniela Rus,et al.  Cellular Automata for Decentralized Control of Self-Reconfigurable Robots , 2007 .