论文信息 - Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

Model-based reinforcement learning (RL) enjoys several benefits, such as data-efficiency and planning, by learning a model of the environment's dynamics. However, learning a global model that can generalize across different dynamics is a challenging task. To tackle this problem, we decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it. In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics. The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.

[1] I. Jolliffe. Principal Components in Regression Analysis , 1986 .

[2] Manfred Morari,et al. Model predictive control: Theory and practice - A survey , 1989, Autom..

[3] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[5] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[8] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[9] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10] Dirk P. Kroese,et al. Chapter 3 – The Cross-Entropy Method for Optimization , 2013 .

[11] Dirk P. Kroese,et al. The cross-entropy method for estimation , 2013 .

[12] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[15] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[16] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[17] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[18] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[19] Quoc V. Le,et al. Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[20] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[21] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22] Balaraman Ravindran,et al. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[23] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.

[24] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27] Sanja Fidler,et al. NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[28] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[29] Raia Hadsell,et al. Graph networks as learnable physics engines for inference and control , 2018, ICML.

[30] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[31] Nan Jiang,et al. Markov Decision Processes with Continuous Side Information , 2017, ALT.

[32] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.

[33] Dawn Xiaodong Song,et al. Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.