ED2: An Environment Dynamics Decomposition Framework for World Model Construction

Model-based reinforcement learning methods achieve significant sample efficiency in many tasks, but their performance is often limited by the existence of the model error. To reduce the model error, previous works use a single well-designed network to fit the entire environment dynamics, which treats the environment dynamics as a black box. However, these methods lack to consider the environmental decomposed property that the dynamics may contain multiple sub-dynamics, which can be modeled separately, allowing us to construct the world model more accurately. In this paper, we propose the Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner. ED2 contains two key components: sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2 discovers the sub-dynamics in an environment and then D2P constructs the decomposed world model following the sub-dynamics. ED2 can be easily combined with existing MBRL algorithms and empirical results show that ED2 significantly reduces the model error and boosts the performance of the state-of-the-art MBRL algorithms on various tasks. 1

[1]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[2]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[3]  Tadahiro Taniguchi,et al.  Variational Inference MPC for Bayesian Model-based Reinforcement Learning , 2019, CoRL.

[4]  Gabriel Dulac-Arnold,et al.  Model-Based Offline Planning , 2020, ArXiv.

[5]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[6]  Sergey Levine,et al.  Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[7]  Pieter Abbeel,et al.  Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[8]  Lantao Yu,et al.  MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.

[9]  Qing He,et al.  Trust the Model When It Is Confident: Masked Model-based Actor-Critic , 2020, NeurIPS.

[10]  Yuandong Tian,et al.  Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.

[11]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[12]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[13]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[14]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[15]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[16]  Yong Yu,et al.  Bidirectional Model-based Policy Optimization , 2020, ICML.

[17]  Honglak Lee,et al.  Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning , 2020, NeurIPS.

[18]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[19]  Thorsten Joachims,et al.  MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[20]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[21]  Honglak Lee,et al.  Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.

[22]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.