Neural probabilistic motor primitives for humanoid control

We focus on the problem of learning a single motor module that can flexibly express a range of behaviors for the control of high-dimensional physically simulated humanoids. To do this, we propose a motor architecture that has the general structure of an inverse model with a latent-variable bottleneck. We show that it is possible to train this model entirely offline to compress thousands of expert policies and learn a motor primitive embedding space. The trained neural probabilistic motor primitive system can perform one-shot imitation of whole-body humanoid behaviors, robustly mimicking unseen trajectories. Additionally, we demonstrate that it is also straightforward to train controllers to reuse the learned motor primitive space to solve tasks, and the resulting movements are relatively naturalistic. To support the training of our model, we compare two approaches for offline policy cloning, including an experience efficient method which we call linear feedback policy cloning. We encourage readers to view a supplementary video ( this https URL ) summarizing our results.

[1]  D. Mayne A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems , 1966 .

[2]  M. A. Athans,et al.  The role and use of the stochastic linear-quadratic-Gaussian problem in control system design , 1971 .

[3]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[4]  Zoubin Ghahramani,et al.  Unsupervised learning of sensory-motor primitives , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[5]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[6]  M. Graziano The organization of behavioral repertoire in motor cortex. , 2006, Annual review of neuroscience.

[7]  E. Bizzi,et al.  Article history: , 2005 .

[8]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[9]  Russ Tedrake,et al.  LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[10]  Libin Liu,et al.  Sampling-based contact-rich motion control , 2010, ACM Trans. Graph..

[11]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[13]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Jan Peters,et al.  Probabilistic Movement Primitives , 2013, NIPS.

[16]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Jan Peters,et al.  Learning modular policies for robotics , 2014, Front. Comput. Neurosci..

[19]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[20]  Libin Liu,et al.  Learning reduced-order feedback policies for motion skills , 2015, Symposium on Computer Animation.

[21]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[22]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Zoran Popovic,et al.  Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[25]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[26]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[27]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[28]  Stefan Schaal,et al.  A Probabilistic Representation for Dynamic Movement Primitives , 2016, ArXiv.

[29]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[30]  Razvan Pascanu,et al.  Sobolev Training for Neural Networks , 2017, NIPS.

[31]  Glen Berseth,et al.  DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..

[32]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[33]  Anca D. Dragan,et al.  DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[34]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[35]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[36]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[37]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[38]  Libin Liu,et al.  Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning , 2018, ACM Trans. Graph..

[39]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[40]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[41]  Stefan Jeschke,et al.  Physics-based motion capture imitation with deep reinforcement learning , 2018, MIG.

[42]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[43]  François Fleuret,et al.  Knowledge Transfer with Jacobian Matching , 2018, ICML.

[44]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.