论文信息 - Online BayesSim for Combined Simulator Parameter Inference and Policy Improvement

Online BayesSim for Combined Simulator Parameter Inference and Policy Improvement

Recent advancements in Bayesian likelihood-free inference enables a probabilistic treatment for the problem of estimating simulation parameters and their uncertainty given sequences of observations. Domain randomization can be performed much more effectively when a posterior distribution provides the correct uncertainty over parameters in a simulated environment. In this paper, we study the integration of simulation parameter inference with both model-free reinforcement learning and model-based control in a novel sequential algorithm that alternates between learning a better estimation of parameters and improving the controller. This approach exploits the interdependence between the two problems to generate computational efficiencies and improved reliability when a black-box simulator is available. Experimental results suggest that both control strategies have better performance when compared to traditional domain randomization methods.

[1] Lionel Ott,et al. DISCO: Double Likelihood-free Inference Stochastic Control , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2] Dieter Fox,et al. BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[3] Gaurav S. Sukhatme,et al. Interactive Differentiable Simulation , 2019, ArXiv.

[4] Yevgen Chebotar,et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[5] Benjamin Recht,et al. A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[6] Iain Murray,et al. Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , 2018, AISTATS.

[7] Dawn Xiaodong Song,et al. Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[8] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[9] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10] James M. Rehg,et al. Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving , 2017, IEEE Transactions on Robotics.

[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[13] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14] Iain Murray,et al. Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[15] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[17] Mike West,et al. Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation , 2015, 1503.07791.

[18] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[19] Yuval Tassa,et al. An integrated system for real-time model predictive control of humanoid robots , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[20] E. Bai,et al. Block Oriented Nonlinear System Identification , 2010 .

[21] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[22] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[23] Dariusz Pazderski,et al. Modeling and control of a 4-wheel skid-steering mobile robot , 2004 .

[24] Paul Marjoram,et al. Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25] D. Balding,et al. Approximate Bayesian computation in population genetics. , 2002, Genetics.

[26] M. Feldman,et al. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[27] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[28] S. Srihari. Mixture Density Networks , 1994 .

[29] P. Kumar,et al. Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[30] P. Diggle,et al. Monte Carlo Methods of Inference for Implicit Statistical Models , 1984 .