BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators

We introduce BayesSim, a framework for robotics simulations allowing a full Bayesian treatment for the parameters of the simulator. As simulators become more sophisticated and able to represent the dynamics more accurately, fundamental problems in robotics such as motion planning and perception can be solved in simulation and solutions transferred to the physical robot. However, even the most complex simulator might still not be able to represent reality in all its details either due to inaccurate parametrization or simplistic assumptions in the dynamic models. BayesSim provides a principled framework to reason about the uncertainty of simulation parameters. Given a black box simulator (or generative model) that outputs trajectories of state and action pairs from unknown simulation parameters, followed by trajectories obtained with a physical robot, we develop a likelihood-free inference method that computes the posterior distribution of simulation parameters. This posterior can then be used in problems where Sim2Real is critical, for example in policy search. We compare the performance of BayesSim in obtaining accurate posteriors in a number of classical control and robotics problems. Results show that the posterior computed from BayesSim can be used for domain randomization outperforming alternative methods that randomize based on uniform priors.

[1]  H. D. Miller,et al.  The Theory Of Stochastic Processes , 1977, The Mathematical Gazette.

[2]  David R. Cox,et al.  The Theory of Stochastic Processes , 1967, The Mathematical Gazette.

[3]  E. Braaten,et al.  An Improved Low-Discrepancy Sequence for Multidimensional Quasi-Monte Carlo Integration , 1979 .

[4]  D.G. Dudley,et al.  Dynamic system identification experiment design and data analysis , 1979, Proceedings of the IEEE.

[5]  P. Diggle,et al.  Monte Carlo Methods of Inference for Implicit Statistical Models , 1984 .

[6]  S. Srihari Mixture Density Networks , 1994 .

[7]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[8]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[9]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[11]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[12]  Darren J. Wilkinson Stochastic Modelling for Systems Biology , 2006 .

[13]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[14]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[15]  P. Freeman,et al.  Likelihood-Free Inference in Cosmology: Potential for the Estimation of Luminosity Functions , 2012 .

[16]  Christian P. Robert,et al.  Stochastic Modelling for Systems Biology (second edition) , 2012 .

[17]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Mike West,et al.  Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation , 2015, 1503.07791.

[19]  Emanuel Todorov,et al.  Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[21]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[22]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[25]  Vikas Sindhwani,et al.  Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..

[26]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[27]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[28]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[29]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[31]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[33]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[34]  Darren J. Wilkinson,et al.  Stochastic Modelling for Systems Biology, Third Edition , 2018 .

[35]  Rae. Z.H. Aliyev,et al.  Interpolation of Spatial Data , 2018, Biomedical Journal of Scientific & Technical Research.

[36]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[37]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..