论文信息 - Policies Modulating Trajectory Generators

Policies Modulating Trajectory Generators

We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity.

[1] Taku Komura,et al. Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[2] Auke Jan Ijspeert,et al. Central pattern generators for locomotion control in animals and robots: A review , 2008, Neural Networks.

[3] Auke Jan Ijspeert,et al. Learning robot gait stability using neural networks as sensory feedback function for Central Pattern Generators , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.

[5] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[6] C. Karen Liu,et al. Learning bicycle stunts , 2014, ACM Trans. Graph..

[7] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[8] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9] Atil Iscen,et al. Controlling tensegrity robots through evolution , 2013, GECCO '13.

[10] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..

[11] Arjun Sharma,et al. Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments , 2018, AAAI.

[12] Wulfram Gerstner,et al. SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[13] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[14] Jan Peters,et al. Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[15] Taku Komura,et al. Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[16] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[17] Atil Iscen,et al. Optimizing Simulations with Noise-Tolerant Structured Exploration , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.