Towards Human-Level Learning of Complex Physical Puzzles

Humans quickly solve tasks in novel systems with complex dynamics, without requiring much interaction. While deep reinforcement learning algorithms have achieved tremendous success in many complex tasks, these algorithms need a large number of samples to learn meaningful policies. In this paper, we present a task for navigating a marble to the center of a circular maze. While this system is very intuitive and easy for humans to solve, it can be very difficult and inefficient for standard reinforcement learning algorithms to learn meaningful policies. We present a model that learns to move a marble in the complex environment within minutes of interacting with the real system. Learning consists of initializing a physics engine with parameters estimated using data from the real system. The error in the physics engine is then corrected using Gaussian process regression, which is used to model the residual between real observations and physics engine simulations. The physics engine equipped with the residual model is then used to control the marble in the maze environment using a model-predictive feedback over a receding horizon. We contrast the learning behavior against the time taken by humans to solve the problem to show comparable behavior. To the best of our knowledge, this is the first time that a hybrid model consisting of a full physics engine along with a statistical function approximator has been used to control a complex physical system in real-time using nonlinear model-predictive control (NMPC). Codes for the simulation environment can be downloaded here this https URL . A video describing our method could be found here this https URL .

[1]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[2]  Pietro Falco,et al.  Data-efficient control policy search using residual dynamics learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[4]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[5]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[8]  Dietmar Heinke,et al.  Looking for intoolligence: A unified framework for the cognitive study of human tool use and technology. , 2018, The American psychologist.

[9]  Kevin A. Smith,et al.  Different Physical Intuitions Exist Between Tasks, Not Domains , 2018, Computational Brain & Behavior.

[10]  Joshua B. Tenenbaum,et al.  The Tools Challenge: Rapid Trial-and-Error Learning in Physical Problem Solving , 2019, CogSci.

[11]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[14]  Pierre-Yves Oudeyer,et al.  Sim-to-Real Transfer with Neural-Augmented Robot Simulation , 2018, CoRL.

[15]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[16]  Devesh K. Jha,et al.  Semiparametrical Gaussian Processes Learning of Forward Dynamical Models for Navigating in a Circular Maze , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Dieter Fox,et al.  BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[19]  J. Betts Survey of Numerical Methods for Trajectory Optimization , 1998 .

[20]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[21]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[22]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[23]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[24]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[26]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[27]  Javier R. Movellan,et al.  Semi-parametric Gaussian process for robot system identification , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[29]  Alan Sullivan,et al.  Sim-to-Real Transfer Learning using Robustified Controllers in Robotic Tasks involving Complex Dynamics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[30]  Alessandro Chiuso,et al.  Online semi-parametric learning for inverse dynamics modeling , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[31]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[33]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Hans Joachim Ferreau,et al.  Efficient Numerical Methods for Nonlinear MPC and Moving Horizon Estimation , 2009 .

[35]  Jiajun Wu,et al.  Combining Physical Simulators and Object-Based Networks for Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[36]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[37]  Leslie Pack Kaelbling,et al.  Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Devesh K. Jha,et al.  Model-Based Reinforcement Learning for Physical Systems Without Velocity and Acceleration Measurements , 2020, IEEE Robotics and Automation Letters.

[39]  Marc Toussaint,et al.  Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.