Learning Reactive and Predictive Differentiable Controllers for Switching Linear Dynamical Models

Humans leverage the dynamics of the environment and their own bodies to accomplish challenging tasks such as grasping an object while walking past it or pushing off a wall to turn a corner. Such tasks often involve switching dynamics as the robot makes and breaks contact. Learning these dynamics is a challenging problem and prone to model inaccuracies, especially near contact regions. In this work, we present a framework for learning composite dynamical behaviors from expert demonstrations. We learn a switching linear dynamical model with contacts encoded in switching conditions as a close approximation of our system dynamics. We then use discrete-time LQR as the differentiable policy class for data-efficient learning of control to develop a control strategy that operates over multiple dynamical modes and takes into account discontinuities due to contact. In addition to predicting interactions with the environment, our policy effectively reacts to inaccurate predictions such as unanticipated contacts. Through simulation and real world experiments, we demonstrate generalization of learned behaviors to different scenarios and robustness to model inaccuracies during execution.

[1]  Nima Fazeli,et al.  Learning Data-Efficient Rigid-Body Contact Models: Case Study of Planar Impact , 2017, CoRL.

[2]  Alberto Bemporad,et al.  The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[3]  Leslie Pack Kaelbling,et al.  Differentiable Algorithm Networks for Composable Robot Learning , 2019, Robotics: Science and Systems.

[4]  Mathew Halm,et al.  ContactNets: Learning of Discontinuous Contact Dynamics with Smooth, Implicit Representations , 2020, CoRL.

[5]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Oliver Kroemer,et al.  Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[8]  Jun Morimoto,et al.  The eMOSAIC model for humanoid robot control , 2012, Neural Networks.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[11]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Marc Toussaint,et al.  Learning discontinuities with products-of-sigmoids for switching between local models , 2005, ICML.

[13]  Mohit Sharma,et al.  A Modular Robotic Arm Control Stack for Research: Franka-Interface and FrankaPy , 2020, ArXiv.

[14]  Jan Peters,et al.  Learning inverse dynamics models with contacts , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Byron Boots,et al.  Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.

[16]  Siddhartha S. Srinivasa,et al.  Unsupervised Learning for Nonlinear PieceWise Smooth Hybrid Systems , 2017, ArXiv.

[17]  Jan Peters,et al.  Learning movement primitive attractor goals and sequential skills from kinesthetic demonstrations , 2015, Robotics Auton. Syst..

[18]  Oliver Kroemer,et al.  A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..

[19]  Nima Fazeli,et al.  Parameter and contact force estimation of planar rigid-bodies undergoing frictional contact , 2017, Int. J. Robotics Res..

[20]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[21]  G. Dullerud,et al.  A Course in Robust Control Theory: A Convex Approach , 2005 .

[22]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[23]  Patrick van der Smagt,et al.  Switching Linear Dynamics for Variational Bayes Filtering , 2019, ICML.

[24]  Danica Kragic,et al.  Data-Efficient Model Learning and Prediction for Contact-Rich Manipulation Tasks , 2020, IEEE Robotics and Automation Letters.

[25]  Pieter Abbeel,et al.  Prediction and Control with Temporal Segment Models , 2017, ICML.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Scott W. Linderman,et al.  Recurrent switching linear dynamical systems , 2016, 1610.08466.

[28]  Johan Pieter de Villiers,et al.  Naïve Bayes switching linear dynamical system: A model for dynamic system modelling, classification, and information fusion , 2018, Inf. Fusion.

[29]  Marc Toussaint,et al.  Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[30]  Edwin Olson,et al.  AprilTag 2: Efficient and robust fiducial detection , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).