Distributed Learning for the Decentralized Control of Articulated Mobile Robots

Decentralized control architectures, such as those conventionally defined by central pattern generators, independently coordinate spatially distributed portions of articulated bodies to achieve system-level objectives. State of the art distributed algorithms for reinforcement learning employ a different but conceptually related idea; independent agents simultaneously coordinating their own behaviors in parallel environments while asynchronously updating the policy of a system-or, rather, meta-level agent. This work, to the best of the authors' knowledge, is the first to explicitly explore the potential relationship between the underlying concepts in homogeneous decentralized control for articulated locomotion and distributed learning. We present an approach that leverages the structure of the asynchronous advantage actor-critic (A3C) algorithm to provide a natural framework for learning decentralized control policies on a single platform. Our primary contribution shows an individual agent in the A3C algorithm can be defined by an independently controlled portion of the robot's body, thus enabling distributed learning on a single platform for efficient hardware implementation. To this end, we show how the system is trained offline using hardware experiments implementing an autonomous decentralized compliant control framework. Our experimental results show that the trained agent outperforms the compliant control baseline by more than 40% in terms of steady progression through a series of randomized, highly cluttered evaluation environments.

[1]  Peter Cave,et al.  Biologically Inspired Robots: Serpentile Locomotors and Manipulators , 1993 .

[2]  Howie Choset,et al.  Parameterized and Scripted Gaits for Modular Snake Robots , 2009, Adv. Robotics.

[3]  Ludovic Righetti,et al.  Pattern generators with sensory feedback for the control of quadruped locomotion , 2008, 2008 IEEE International Conference on Robotics and Automation.

[4]  Howie Choset,et al.  Gait-based compliant control for snake robots , 2013, 2013 IEEE International Conference on Robotics and Automation.

[5]  Kazuyuki Ito,et al.  A study of reinforcement learning for the robot with many degrees of freedom - acquisition of locomotion patterns for multi-legged robot , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[6]  Auke Jan Ijspeert,et al.  AmphiBot II: An Amphibious Snake Robot that Crawls and Swims using a Central Pattern Generator , 2006 .

[7]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[8]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[9]  David Johan Christensen,et al.  A distributed and morphology-independent strategy for adaptive locomotion in self-reconfigurable modular robots , 2013, Robotics Auton. Syst..

[10]  Ryota Yamashina,et al.  Caterpillar robot locomotion based on Q-Learning using objective/subjective reward , 2011, 2011 IEEE/SICE International Symposium on System Integration (SII).

[11]  Stefano Nolfi,et al.  Evolution of Collective Behavior in a Team of Physically Linked Robots , 2003, EvoWorkshops.

[12]  Howie Choset,et al.  Shape-constrained whole-body adaptivity , 2015, 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[13]  Karl Tuyls,et al.  The importance of experience replay database composition in deep reinforcement learning , 2015 .

[14]  Howie Choset,et al.  Design and architecture of a series elastic snake robot , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[16]  Christian Ott,et al.  Unified Impedance and Admittance Control , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  A. Prochazka,et al.  Positive force feedback control of muscles. , 1997, Journal of neurophysiology.

[18]  Akio Ishiguro,et al.  Local reflexive mechanisms essential for snakes' scaffold-based locomotion , 2012, Bioinspiration & biomimetics.

[19]  A. E. Eiben,et al.  HyperNEAT for Locomotion Control in Modular Robots , 2010, ICES.

[20]  Howie Choset,et al.  Shape-based compliant control with variable coordination centralization on a snake robot , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[21]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.