论文信息 - Planning from Pixels using Inverse Dynamics Models - 字舞流文

Planning from Pixels using Inverse Dynamics Models

Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards. We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches.

Sheila A. McIlraith | Jimmy Ba | Keiran Paster | Jimmy Ba | Keiran Paster

[1] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[2] Sergey Levine,et al. Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.

[3] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[4] Fabio Viola,et al. Causally Correct Partial Models for Reinforcement Learning , 2020, ArXiv.

[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[6] Peter Stone,et al. RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration , 2019, IEEE Robotics and Automation Letters.

[7] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[8] Arthur G. Richards,et al. Robust constrained model predictive control , 2005 .

[9] Juergen Schmidhuber,et al. Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them to Actions , 2019, ArXiv.

[10] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[11] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[12] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[13] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.

[14] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[15] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[16] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[17] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[18] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[19] Yoshua Bengio,et al. Unsupervised State Representation Learning in Atari , 2019, NeurIPS.

[20] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[22] Wojciech Zaremba,et al. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[23] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[24] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[25] Pieter Abbeel,et al. rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.

[26] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.

[27] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[28] Filipe Wall Mutz,et al. Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.

[29] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[30] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[31] David Warde-Farley,et al. Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.