Brief Survey of Model-Based Reinforcement Learning Techniques

Model-free reinforcement learning (MFRL) usually has better asymptotic performance than the model-based reinforcement (MBRL) learning algorithms, especially in complex environments. But MBRL algorithms are very often much more sample-efficient, and sometimes are able to learn control tasks in just a handful of trials. In addition, in some domains, the MBRL algorithms can reach the MFRL performance with better sample efficiency. In recent years, MBRL research has increased in various application domains, such as robot control tasks, or game environments with complex observations. In this paper, we review the most popular techniques used in MBRL and look at some useful classification of algorithms in this area.

[1]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[2]  Dongbin Zhao,et al.  A Survey of Deep Reinforcement Learning in Video Games , 2019, ArXiv.

[3]  Wenlong Fu,et al.  Model-based reinforcement learning: A survey , 2018 .

[4]  Sylvain Calinon,et al.  A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials , 2018, IEEE Transactions on Robotics.

[5]  Yuandong Tian,et al.  Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.

[6]  Pieter Abbeel,et al.  Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[7]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[8]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[9]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[10]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[11]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[12]  Athanasios S. Polydoros,et al.  Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..

[13]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[14]  Ted Xiao Generative Adversarial Networks for Model Based Reinforcement Learning with Tree Search , 2017 .

[15]  Florin Leon,et al.  A Modified I2A Agent for Learning in a Stochastic Environment , 2020, ICCCI.

[16]  Sergey Levine,et al.  Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[17]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[18]  MODEL-ENSEMBLE TRUST-REGION POLICY OPTI- , 2017 .

[19]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[20]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[21]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[22]  Honglak Lee,et al.  Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.

[23]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[24]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[25]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[26]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[27]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[28]  Richard S. Sutton,et al.  Temporal-difference search in computer Go , 2012, Machine Learning.

[29]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.