Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

Developing control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties. During execution, at each time step our approach computes what the simulation-based control policy would do, but then, rather than executing these controls on the real robot, our approach computes what the simulation expects the resulting next state(s) will be, and then relies on a learned deep inverse dynamics model to decide which real-world action is most suitable to achieve those next states. Deep models are only as good as their training data, and we also propose an approach for data collection to (incrementally) learn the deep inverse dynamics model. Our experiments shows our approach compares favorably with various baselines that have been developed for dealing with simulation to real world model discrepancy, including output error control and Gaussian dynamics adaptation.

[1]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[2]  Vincent Hayward,et al.  System Identification and Modelling of A High Performance Hydraulic Actuator , 1991, ISER.

[3]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[4]  Takeo Kanade,et al.  System identification of small-size unmanned helicopter dynamics , 1999 .

[5]  Geir Hovland,et al.  Nonlinear identification of backlash in robot transmissions , 2002 .

[6]  Inna Sharf,et al.  Literature survey of contact dynamics modelling , 2002 .

[7]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[8]  Roy Featherstone,et al.  Rigid Body Dynamics Algorithms , 2007 .

[9]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[11]  Peter Stone,et al.  Transfer learning for reinforcement learning on a physical robot , 2010, AAMAS 2010.

[12]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[13]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[14]  Greg Turk,et al.  Articulated swimming creatures , 2011, SIGGRAPH 2011.

[15]  C. Karen Liu,et al.  Soft body locomotion , 2012, ACM Trans. Graph..

[16]  Thomas Ingebretsen,et al.  System Identification of Unmanned Aerial Vehicles , 2012 .

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Michael Yurievich Levashov,et al.  Modeling, system identication, and control for dynamic locomotion of the LittleDog robot on rough terrain , 2012 .

[20]  Trevor Darrell,et al.  Efficient Learning of Domain-invariant Image Representations , 2013, ICLR.

[21]  Yuval Tassa,et al.  Modeling and identification of pneumatic actuators , 2013, 2013 IEEE International Conference on Mechatronics and Automation.

[22]  Michael C. Yip,et al.  Model-Less Feedback Control of Continuum Manipulators in Constrained Environments , 2014, IEEE Transactions on Robotics.

[23]  Taesoo Kwon,et al.  Locomotion control for many-muscle humanoids , 2014, ACM Trans. Graph..

[24]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[25]  Martin A. Riedmiller,et al.  Approximate real-time optimal control based on sparse Gaussian process models , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[26]  Emanuel Todorov,et al.  Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Jonathan P. How,et al.  Real-World Reinforcement Learning via Multifidelity Simulators , 2015, IEEE Transactions on Robotics.

[28]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[29]  Jan Peters,et al.  Learning inverse dynamics models with contacts , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Pieter Abbeel,et al.  Deep learning helicopter dynamics models , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Sergey Levine,et al.  Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.

[32]  Chien-Liang Fok,et al.  Actuator Control for the NASA‐JSC Valkyrie Humanoid Robot: A Decoupled Dynamics Approach for Torque Control of Series Elastic Robots , 2015, J. Field Robotics.

[33]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[34]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[35]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[36]  Sergey Levine,et al.  Learning dexterous manipulation for a soft robotic hand from human demonstrations , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[38]  Daniel King,et al.  Fetch & Freight : Standard Platforms for Service Robot Applications , 2016 .

[39]  Sergey Levine,et al.  One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Chonhyon Park,et al.  Robot Motion Planning for Pouring Liquids , 2016, ICAPS.

[41]  Stefan Schaal,et al.  Towards robust online inverse dynamics learning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42]  Pieter Abbeel,et al.  Combining model-based policy search with online model learning for control of physical humanoids , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.