论文信息 - Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

While model-based deep reinforcement learning (RL) holds great promise for sample efficiency and generalization, learning an accurate dynamics model is often challenging and requires substantial interaction with the environment. A wide variety of domains have dynamics that share common foundations like the laws of classical mechanics, which are rarely exploited by existing algorithms. In fact, humans continuously acquire and use such dynamics priors to easily adapt to operating in new environments. In this work, we propose an approach to learn task-agnostic dynamics priors from videos and incorporate them into an RL agent. Our method involves pre-training a frame predictor on task-agnostic physics videos to initialize dynamics models (and fine-tune them) for unseen target environments. Our frame prediction architecture, SpatialNet, is designed specifically to capture localized physical phenomena and interactions. Our approach allows for both faster policy learning and convergence to better policies, outperforming competitive approaches on several different environments. We also demonstrate that incorporating this prior allows for more effective transfer between environments.

Yilun Du | Karthik Narasimhan | Karthik Narasimhan | Yilun Du

[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[3] Jan Peters,et al. Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[5] David Wingate,et al. A Physics-Based Model Prior for Object-Oriented MDPs , 2014, ICML.

[6] Jonathan P. How,et al. Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[7] Peter Stone,et al. The Impact of Determinism on Learning Atari 2600 Games , 2015, AAAI Workshop: Learning for General Competency in Video Games.

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] Viorica Patraucean,et al. Spatio-temporal video autoencoder with differentiable memory , 2015, ArXiv.

[10] Jonathan P. How,et al. Efficient reinforcement learning for robots using informative simulated priors , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[12] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[13] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[14] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[15] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.

[16] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[17] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[19] Sergey Levine,et al. Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20] Razvan Pascanu,et al. Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[21] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[22] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[23] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24] Xiqun Chen,et al. Short-Term Forecasting of Passenger Demand under On-Demand Ride Services: A Spatio-Temporal Deep Learning Approach , 2017, ArXiv.

[25] Razvan Pascanu,et al. Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[26] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[27] Philippe Beaudoin,et al. Independently Controllable Factors , 2017, ArXiv.

[28] Juan Song,et al. Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM , 2017, IEEE Access.

[29] Dileep George,et al. Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[30] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[31] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[32] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[33] Ruslan Salakhutdinov,et al. Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[34] Samy Bengio,et al. A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[35] Marlos C. Machado,et al. Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[36] Alexei A. Efros,et al. Investigating Human Priors for Playing Video Games , 2018, ICML.

[37] John Schulman,et al. Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[38] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[39] Daniel L. K. Yamins,et al. Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.

[40] Sinno Jialin Pan,et al. Hashing Over Predicted Future Frames for Informed Exploration of Deep Reinforcement Learning , 2017, IJCAI.

[41] Joelle Pineau,et al. Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.