Online BayesSim for Combined Simulator Parameter Inference and Policy Improvement

Recent advancements in Bayesian likelihood-free inference enables a probabilistic treatment for the problem of estimating simulation parameters and their uncertainty given sequences of observations. Domain randomization can be performed much more effectively when a posterior distribution provides the correct uncertainty over parameters in a simulated environment. In this paper, we study the integration of simulation parameter inference with both model-free reinforcement learning and model-based control in a novel sequential algorithm that alternates between learning a better estimation of parameters and improving the controller. This approach exploits the interdependence between the two problems to generate computational efficiencies and improved reliability when a black-box simulator is available. Experimental results suggest that both control strategies have better performance when compared to traditional domain randomization methods.

[1]  Lionel Ott,et al.  DISCO: Double Likelihood-free Inference Stochastic Control , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Dieter Fox,et al.  BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[3]  Gaurav S. Sukhatme,et al.  Interactive Differentiable Simulation , 2019, ArXiv.

[4]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[5]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[6]  Iain Murray,et al.  Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , 2018, AISTATS.

[7]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[8]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[9]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10]  James M. Rehg,et al.  Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving , 2017, IEEE Transactions on Robotics.

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[13]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[15]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Mike West,et al.  Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation , 2015, 1503.07791.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Yuval Tassa,et al.  An integrated system for real-time model predictive control of humanoid robots , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[20]  E. Bai,et al.  Block Oriented Nonlinear System Identification , 2010 .

[21]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[22]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[23]  Dariusz Pazderski,et al.  Modeling and control of a 4-wheel skid-steering mobile robot , 2004 .

[24]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[26]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  S. Srihari Mixture Density Networks , 1994 .

[29]  P. Kumar,et al.  Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[30]  P. Diggle,et al.  Monte Carlo Methods of Inference for Implicit Statistical Models , 1984 .