Fixed $\beta$-VAE Encoding for Curious Exploration in Complex 3D Environments

Curiosity is a general method for augmenting an environment reward with an intrinsic reward, which encourages exploration and is especially useful in sparse reward settings. As curiosity is calculated using next state prediction error, the type of state encoding used has a large impact on performance. Random features and inversedynamics features are generally preferred over VAEs based on previous results from Atari and other mostly 2D environments. However, unlike VAEs, they may not encode sufficient information for optimal behaviour, which becomes increasingly important as environments become more complex. In this paper, we use the sparse reward 3D physics environment Animal-AI, to demonstrate how a fixed β-VAE encoding can be used effectively with curiosity. We combine this with curriculum learning to solve the previously unsolved exploration intensive detour tasks while achieving 22% gain in sample efficiency on the training curriculum against the next best encoding. We also corroborate the results on Atari Breakout, with our custom encoding outperforming random features and inverse-dynamics features.

[1]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[2]  Hao Wang,et al.  Curiosity-Driven Variational Autoencoder for Deep Q Network , 2020, PAKDD.

[3]  M. Shanahan,et al.  Artificial Intelligence and the Common Sense of Animals , 2020, Trends in Cognitive Sciences.

[4]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[5]  Zi Huang,et al.  Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation , 2019, ACM Multimedia.

[6]  Volker Tresp,et al.  Curiosity-Driven Experience Prioritization via Density Estimation , 2018, ArXiv.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[9]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[10]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[11]  Karl J. Friston The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[12]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[13]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[14]  Yoshua Bengio,et al.  Unsupervised State Representation Learning in Atari , 2019, NeurIPS.

[15]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[16]  Murray Shanahan,et al.  The Animal-AI Testbed and Competition , 2019, NeurIPS.

[17]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[18]  Giorgio Vallortigara,et al.  Intuitive physical reasoning about occluded objects by inexperienced chicks , 2011, Proceedings of the Royal Society B: Biological Sciences.

[19]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[20]  Bradley P. Smith,et al.  How well do dingoes, Canis dingo, perform on the detour task? , 2010, Animal Behaviour.

[21]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[22]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[24]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[26]  Sandy H. Huang,et al.  Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning , 2019, ArXiv.