Policies Modulating Trajectory Generators

We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity.

[1]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[2]  Auke Jan Ijspeert,et al.  Central pattern generators for locomotion control in animals and robots: A review , 2008, Neural Networks.

[3]  Auke Jan Ijspeert,et al.  Learning robot gait stability using neural networks as sensory feedback function for Central Pattern Generators , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[5]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[6]  C. Karen Liu,et al.  Learning bicycle stunts , 2014, ACM Trans. Graph..

[7]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[8]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9]  Atil Iscen,et al.  Controlling tensegrity robots through evolution , 2013, GECCO '13.

[10]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[11]  Arjun Sharma,et al.  Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments , 2018, AAAI.

[12]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[13]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[14]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[15]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[16]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[17]  Atil Iscen,et al.  Optimizing Simulations with Noise-Tolerant Structured Exploration , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.