Fast Model Identification via Physics Engines for Data-Efficient Policy Search

This paper presents a method for identifying mechanical parameters of robots or objects, such as their mass and friction coefficients. Key features are the use of off-the-shelf physics engines and the adaptation of a Bayesian optimization technique towards minimizing the number of real-world experiments needed for model-based reinforcement learning. The proposed framework reproduces in a physics engine experiments performed on a real robot and optimizes the model's mechanical parameters so as to match real-world trajectories. The optimized model is then used for learning a policy in simulation, before real-world deployment. It is well understood, however, that it is hard to exactly reproduce real trajectories in simulation. Moreover, a near-optimal policy can be frequently found with an imperfect model. Therefore, this work proposes a strategy for identifying a model that is just good enough to approximate the value of a locally optimal policy with a certain confidence, instead of wasting effort on identifying the most accurate model. Evaluations, performed both in simulation and on a real robotic manipulation task, indicate that the proposed strategy results in an overall time-efficient, integrated model identification and learning solution, which significantly improves the data-efficiency of existing policy search algorithms.

[1]  HennigPhilipp,et al.  Entropy search for information-efficient global optimization , 2012 .

[2]  Kostas E. Bekris,et al.  From Quasi-static to Kinodynamic Planning for Spherical Tensegrity Locomotion , 2017, ISRR.

[3]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[4]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[7]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8]  Jan Swevers,et al.  Optimal robot excitation and identification , 1997, IEEE Trans. Robotics Autom..

[9]  Kostas E. Bekris,et al.  Asymptotically optimal sampling-based kinodynamic planning , 2014, Int. J. Robotics Res..

[10]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Jean-Baptiste Mouret,et al.  Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Kostas E. Bekris,et al.  Informed and probabilistically complete search for motion planning under differential constraints , 2008, AAAI 2008.

[13]  Siddhartha S. Srinivasa,et al.  GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems , 2017, ArXiv.

[14]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[15]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[16]  Sergey Levine,et al.  Goal-driven dynamics learning via Bayesian optimization , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[17]  David Wingate,et al.  A Physics-Based Model Prior for Object-Oriented MDPs , 2014, ICML.

[18]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[19]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[20]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[21]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[22]  J. Andrew Bagnell,et al.  A convex polynomial force-motion model for planar sliding: Identification and application , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Kevin M. Lynch,et al.  Stable Pushing: Mechanics, Controllability, and Planning , 1995, Int. J. Robotics Res..

[24]  Andreas Krause,et al.  Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Byron Boots,et al.  Simulation-based design of dynamic controllers for humanoid balancing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[27]  Yuval Tassa,et al.  Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[28]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[29]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[31]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..