论文信息 - Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterparts. In this paper, we demonstrate a simple method to bridge this “reality gap”. By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained. This adaptivity enables the policies to generalize to the dynamics of the real world without any training on the physical system. Our approach is demonstrated on an object pushing task using a robotic arm. Despite being trained exclusively in simulation, our policies are able to maintain a similar level of performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. We explore the impact of various design decisions and show that the resulting policies are robust to significant calibration error.

[1] Kevin M. Lynch,et al. Stable Pushing: Mechanics, Controllability, and Planning , 1995, Int. J. Robotics Res..

[2] Matthew T. Mason,et al. Posing Polygonal Objects in the Plane by Pushing , 1998, Int. J. Robotics Res..

[3] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[4] Siddhartha S. Srinivasa,et al. A Framework for Push-Grasping in Clutter , 2011, Robotics: Science and Systems.

[5] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6] Emanuel Todorov,et al. Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.

[10] Nolan Wagener,et al. Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[12] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[13] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[14] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[15] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[16] Wojciech Zaremba,et al. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[17] Sergey Levine,et al. Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.