Maximum Entropy Model-based Reinforcement Learning

Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to a practical and real-world tasks is currently limited due to most RL state-of-art algorithms’ sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five algorithm that has beaten human players in Dota 2 has trained for thousands of years of game time. Several approaches exist that tackle the issue of sample inefficiency, that either offer a more efficient usage of already gathered experience or aim to gain a more relevant and diverse experience via a better exploration of an environment. However, to our knowledge, no such approach exist for model-based algorithms, that showed their high sample efficiency in solving hard control tasks with high-dimensional state space. This work connects exploration techniques and model-based reinforcement learning. We have designed a novel exploration method that takes into account features of the model-based approach. We also demonstrate through experiments that our method significantly improves the performance of model-based algorithm Dreamer.

[1]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[2]  P. Kostenetskiy,et al.  HPC Resources of the Higher School of Economics , 2021, Journal of Physics: Conference Series.

[3]  Joelle Pineau,et al.  Novelty Search in representational space for sample efficient exploration , 2020, NeurIPS.

[4]  Razvan Pascanu,et al.  Spectral Normalisation for Deep Reinforcement Learning: an Optimisation Perspective , 2021, ICML.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Alessandro Lazaric,et al.  Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning , 2021, ICLR.

[7]  Sham M. Kakade,et al.  Provably Efficient Maximum Entropy Exploration , 2018, ICML.

[8]  S. Srihari Mixture Density Networks , 1994 .

[9]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[10]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[11]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[12]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[13]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[14]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[15]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[16]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[17]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[18]  Tadahiro Taniguchi,et al.  Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction , 2020, ArXiv.

[19]  Nicu Sebe,et al.  Latent World Models For Intrinsically Motivated Exploration , 2020, NeurIPS.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).