论文信息 - Towards a Simple Approach to Multi-step Model-based Reinforcement Learning - 字舞流文

Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

When environmental interaction is expensive, model-based reinforcement learning offers a solution by planning ahead and avoiding costly mistakes. Model-based agents typically learn a single-step transition model. In this paper, we propose a multi-step model that predicts the outcome of an action sequence with variable length. We show that this model is easy to learn, and that the model can make policy-conditional predictions. We report preliminary results that show a clear advantage for the multi-step model compared to its one-step counterpart.

Kavosh Asadi | Evan Cater | Dipendra Misra | Michael L. Littman | Dipendra Kumar Misra | M. Littman | Kavosh Asadi | Evan Cater

[1] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[2] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[4] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[5] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.

[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.

[8] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10] Bernardo Ávila Pires,et al. Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models , 2016, COLT.

[11] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.

[12] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.

[13] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[14] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.

[15] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[16] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[17] A. Markman,et al. Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[18] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[20] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[21] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.

[22] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[23] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[24] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.

[25] Satinder P. Singh,et al. Linear options , 2010, AAMAS.

[26] G. Box. Science and Statistics , 1976 .

[27] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[28] Rich Sutton,et al. A Deeper Look at Planning as Learning from Replay , 2015, ICML.

[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30] Shalabh Bhatnagar,et al. Multi-Step Dyna Planning for Policy Evaluation and Control , 2009, NIPS.

[31] Kavosh Asadi Atui. Strengths, Weaknesses, and Combinations of Model-based and Model-free Reinforcement Learning , 2016 .

[32] Richard S. Sutton,et al. Integrated Modeling and Control Based on Reinforcement Learning , 1990, NIPS.

[33] M. Littman,et al. Mean Actor Critic , 2017, ArXiv.

[34] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.

[35] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.

[36] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[37] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[38] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.

[39] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[40] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.