Neural Dynamic Policies for End-to-End Sensorimotor Learning

The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decisions individually at each timestep in training, and hence, limits the scalability to continuous, high-dimensional, and long-horizon tasks. In contrast, research in classical robotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations. These techniques, however, lack the flexibility and generalizability provided by deep learning or reinforcement learning and have remained under-explored in such settings. In this work, we begin to close this gap and embed the structure of a dynamical system into deep neural network-based policies by reparameterizing action spaces via second-order differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where actions represent the raw control space. The embedded structure allows end-to-end policy learning for both reinforcement and imitation learning setups. We show that NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks for both imitation and reinforcement learning setups. Project video and code are available at this https URL

[1]  Abhinav Gupta,et al.  Dynamics-aware Embeddings , 2019, ICLR.

[2]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[3]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[4]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[5]  Byron Boots,et al.  RMPflow: A Computational Graph for Automatic Motion Policy Generation , 2018, WAFR.

[6]  Darwin G. Caldwell,et al.  Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Daniel Kappler,et al.  Riemannian Motion Policies , 2018, ArXiv.

[8]  Tucker Hermans,et al.  Active Learning of Probabilistic Movement Primitives , 2019, 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids).

[9]  Silvio Savarese,et al.  Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Miles Cranmer,et al.  Lagrangian Neural Networks , 2020, ICLR 2020.

[11]  M. Spong,et al.  Robot Modeling and Control , 2005 .

[12]  Stefan Schaal,et al.  Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[14]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[15]  Byron Boots,et al.  Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems , 2020, L4DC.

[16]  Ashwin P. Dani,et al.  Learning Partially Contracting Dynamical Systems from Demonstrations , 2017, CoRL.

[17]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[18]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Jun Morimoto,et al.  Deep Encoder-Decoder Networks for Mapping Raw Images to Dynamic Movement Primitives , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Stefan Schaal,et al.  Learning feedback terms for reactive planning and control , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Sergey Levine,et al.  Divide-and-Conquer Reinforcement Learning , 2017, ICLR.

[22]  Nicolas Perrin,et al.  Fast diffeomorphic matching to learn globally asymptotically stable nonlinear dynamical systems , 2016, Syst. Control. Lett..

[23]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[24]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[25]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[26]  Affan Pervez,et al.  Learning task-parameterized dynamic movement primitives using mixture of GMMs , 2018, Intell. Serv. Robotics.

[27]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[28]  Klaus Neumann,et al.  Learning robot motions with stable dynamical systems under diffeomorphic transformations , 2015, Robotics Auton. Syst..

[29]  A. D. Lewis,et al.  Geometric Control of Mechanical Systems , 2004, IEEE Transactions on Automatic Control.

[30]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Stefan Schaal,et al.  Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[32]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[33]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[34]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[35]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[36]  Jan Peters,et al.  Reinforcement learning vs human programming in tetherball robot games , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[38]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[39]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[40]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[41]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[42]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[43]  Darwin G. Caldwell,et al.  Kernelized movement primitives , 2017, Int. J. Robotics Res..

[44]  Maximilian Karl,et al.  Dynamic movement primitives in latent space of time-dependent variational autoencoders , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[45]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[46]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Kris Kitani,et al.  Ego-Pose Estimation and Forecasting As Real-Time PD Control , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Byron Boots,et al.  Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.

[49]  Satoshi Endo,et al.  Dynamic Movement Primitives for Human-Robot interaction: Comparison with human behavioral observation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[50]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51]  Sylvain Calinon,et al.  A tutorial on task-parameterized movement learning and retrieval , 2016, Intell. Serv. Robotics.