End-to-End Stable Imitation Learning via Autonomous Neural Dynamic Policies

State-of-the-art sensorimotor learning algorithms offer policies that can often produce unstable behaviors, damaging the robot and/or the environment. Traditional robot learning, on the contrary, relies on dynamical system-based policies that can be analyzed for stability/safety. Such policies, however, are neither flexible nor generic and usually work only with proprioceptive sensor states. In this work, we bridge the gap between generic neural network policies and dynamical system-based policies, and we introduce Autonomous Neural Dynamic Policies (ANDPs) that: (a) are based on autonomous dynamical systems, (b) always produce asymptotically stable behaviors, and (c) are more flexible than traditional stable dynamical system-based policies. ANDPs are fully differentiable, flexible generic-policies that can be used in imitation learning setups while ensuring asymptotic stability. In this paper, we explore the flexibility and capacity of ANDPs in several imitation learning tasks including experiments with image observations. The results show that ANDPs combine the benefits of both neural network-based and dynamical system-based methods.

[1]  Deepak Pathak,et al.  Hierarchical Neural Dynamic Policies , 2021, Robotics: Science and Systems.

[2]  Aude Billard,et al.  Learning dynamical systems with bifurcations , 2021, Robotics Auton. Syst..

[3]  Abhinav Gupta,et al.  Neural Dynamic Policies for End-to-End Sensorimotor Learning , 2020, NeurIPS.

[4]  Aude Billard,et al.  A Dynamical System Approach for Adaptive Grasping, Navigation and Co-Manipulation with Humanoid Robots , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Scott Kuindersma,et al.  A Comparison of Action Spaces for Learning Manipulation Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Aude Billard,et al.  A Dynamical System Approach to Motion and Force Generation in Contact Tasks , 2019, Robotics: Science and Systems.

[7]  Silvio Savarese,et al.  Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Aude Billard,et al.  A Physically-Consistent Bayesian Non-Parametric Mixture Model for Dynamical System Learning , 2018, CoRL.

[9]  Sylvain Calinon,et al.  A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials , 2018, IEEE Transactions on Robotics.

[10]  Aude Billard,et al.  Learning Augmented Joint-Space Task-Oriented Dynamical Systems: A Linear Parameter Varying and Synergetic Control Approach , 2018, IEEE Robotics and Automation Letters.

[11]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[12]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[13]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[14]  Jun Morimoto,et al.  Trajectory representation by nonlinear scaling of dynamic movement primitives , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Olivier Sigaud,et al.  Learning compact parameterized skills with a single regression , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[16]  Olivier Sigaud,et al.  Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[17]  S. Schaal,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[18]  A. Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[19]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[20]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[21]  A. Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[22]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[23]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[24]  Peggy Fidelman,et al.  Learning Ball Acquisition on a Physical Robot , 2004 .

[25]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.