论文信息 - Variational Auto-Regularized Alignment for Sim-to-Real Control

Variational Auto-Regularized Alignment for Sim-to-Real Control

General-purpose simulators can be a valuable data source for flexible learning and control approaches. However, training models or control policies in simulation and then directly applying to hardware can yield brittle control. Instead, we propose a novel way to use simulators as regularizers. Our approach regularizes a decoder of a variational autoencoder to a black-box simulation, with the latent space bound to a subset of simulator parameters. This enables successful encoder training from a small number of real-world trajectories (10 in our experiments), yielding a latent space with simulation parameter distribution that matches the real-world setting. We use a learnable mixture for the latent prior/posterior, which implies a highly flexible class of densities for the posterior fit. Our approach is scalable and does not require restrictive distributional assumptions. We demonstrate ability to recover matching parameter distributions on a range of benchmarks, challenging custom simulation environments and several real-world scenarios. Our experiments using ABB YuMi robot hardware show ability to help reinforcement learning approaches overcome cases of severe sim-to-real mismatch.

Danica Kragic | Rika Antonova | Martin Hwasser

[1] Danica Kragic,et al. Reinforcement Learning for Pivoting Task , 2017, ArXiv.

[2] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[4] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[5] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[6] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[7] Ronald Kemker,et al. Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[8] Dieter Fox,et al. BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[9] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[10] Yevgen Chebotar,et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[11] Danica Kragic,et al. VPE: Variational Policy Embedding for Transfer Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[12] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[13] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[14] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[15] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[16] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[17] Danica Kragic,et al. SimTrack: A simulation-based framework for scalable real-time object pose detection and tracking , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18] Mohammad Norouzi,et al. Understanding Posterior Collapse in Generative Latent Variable Models , 2019, DGS@ICLR.

[19] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[21] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[22] Gaurav S. Sukhatme,et al. Zero-Shot Skill Composition and Simulation-to-Real Transfer by Learning Task Representations , 2018, ArXiv.

[23] Danica Kragic,et al. Dual arm manipulation - A survey , 2012, Robotics Auton. Syst..

[24] Shimon Whiteson,et al. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2020, ICLR.

[25] Siddhartha S. Srinivasa,et al. The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[26] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.