Planning from Pixels using Inverse Dynamics Models

Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards. We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches.

[1]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[2]  Sergey Levine,et al.  Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.

[3]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[4]  Fabio Viola,et al.  Causally Correct Partial Models for Reinforcement Learning , 2020, ArXiv.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Peter Stone,et al.  RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration , 2019, IEEE Robotics and Automation Letters.

[7]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[8]  Arthur G. Richards,et al.  Robust constrained model predictive control , 2005 .

[9]  Juergen Schmidhuber,et al.  Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions , 2019, ArXiv.

[10]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[13]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[14]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[15]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[16]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[17]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[18]  Pieter Abbeel,et al.  Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[19]  Yoshua Bengio,et al.  Unsupervised State Representation Learning in Atari , 2019, NeurIPS.

[20]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[23]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[24]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[25]  Pieter Abbeel,et al.  rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.

[26]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[27]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[28]  Filipe Wall Mutz,et al.  Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.

[29]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[30]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[31]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.