Semantic Exploration from Language Abstractions and Pretrained Representations

Effective exploration is a challenge in reinforcement learning (RL). Novelty-based exploration methods can suffer in high-dimensional state spaces, such as continuous partially-observable 3D environments. We address this challenge by defining novelty using semantically meaningful state abstractions, which can be found in learned representations shaped by natural language. In particular, we evaluate vision-language representations, pretrained on natural image captioning datasets. We show that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments. We also characterize why and how language provides useful abstractions for exploration by considering the impacts of using representations from a pretrained model, a language oracle, and several ablations. We demonstrate the benefits of our approach in two very different task domains—one that stresses the identification and manipulation of everyday objects, and one that requires navigational exploration in an expansive world—as well as two popular deep RL algorithms: Impala and R2D2. Our results suggest that using language-shaped representations could improve exploration for various algorithms and agents in challenging environments.

[1]  Ilija Radosavovic,et al.  Masked Visual Pre-training for Motor Control , 2022, ArXiv.

[2]  A. Gupta,et al.  The Unsurprising Effectiveness of Pre-Trained Vision Models for Control , 2022, ICML.

[3]  Improving Intrinsic Exploration with Language Abstractions , 2022, ArXiv.

[4]  A. Torralba,et al.  Pre-Trained Language Models for Interactive Decision-Making , 2022, NeurIPS.

[5]  S. Gu,et al.  Can Wikipedia Help Offline Reinforcement Learning? , 2022, ArXiv.

[6]  R. Mottaghi,et al.  Simple but Effective: CLIP Embeddings for Embodied AI , 2021, Computer Vision and Pattern Recognition.

[7]  David Warde-Farley,et al.  Learning more skills through optimistic exploration , 2021, ICLR.

[8]  Tamara von Glehn,et al.  Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning , 2021, ArXiv.

[9]  Stuart J. Russell,et al.  MADE: Exploration via Maximizing Deviation from Explored Regions , 2021, NeurIPS.

[10]  Phillip Isola,et al.  Curious Representation Learning for Embodied Intelligence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Dorsa Sadigh,et al.  ELLA: Exploration through Learned Language Abstraction , 2021, NeurIPS.

[12]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[13]  K. Simonyan,et al.  High-Performance Large-Scale Image Recognition Without Normalization , 2021, ICML.

[14]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[15]  Xia Hu,et al.  Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments , 2021, ICLR.

[16]  Matthew J. Hausknecht,et al.  ALFWorld: Aligning Text and Embodied Environments for Interactive Learning , 2020, ICLR.

[17]  Joshua B. Tenenbaum,et al.  Learning with AMIGo: Adversarially Motivated Intrinsic Goals , 2020, ICLR.

[18]  Kenneth O. Stanley,et al.  First return, then explore , 2020, Nature.

[19]  K. Keutzer,et al.  NovelD: A Simple yet Effective Exploration Criterion , 2021, NeurIPS.

[20]  Felix Hill,et al.  Imitating Interactive Intelligence , 2020, ArXiv.

[21]  Jane X. Wang,et al.  Temporal Difference Uncertainties as a Signal for Exploration , 2020, ArXiv.

[22]  Ruslan Salakhutdinov,et al.  Object Goal Navigation using Goal-Oriented Semantic Exploration , 2020, NeurIPS.

[23]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[24]  Andrew K. Lampinen,et al.  Automated curriculum generation through setter-solver interactions , 2020, ICLR.

[25]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[26]  Pierre-Yves Oudeyer,et al.  Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration , 2020, NeurIPS.

[27]  Daniel Guo,et al.  Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[28]  Richard Futrell,et al.  Universals of word order reflect optimization of grammars for efficient communication , 2020, Proceedings of the National Academy of Sciences.

[29]  Shie Mannor,et al.  Language is Power: Representing States Using Natural Language in Reinforcement Learning. , 2019, 1910.02789.

[30]  Marlos C. Machado,et al.  Count-Based Exploration with the Successor Representation , 2018, AAAI.

[31]  Deepak Pathak,et al.  Self-Supervised Exploration via Disagreement , 2019, ICML.

[32]  Prasoon Goyal,et al.  Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[33]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[34]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[35]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[36]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[37]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[38]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40]  Richard Y. Chen,et al.  UCB EXPLORATION VIA Q-ENSEMBLES , 2018 .

[41]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[42]  Marcus Hutter,et al.  Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.

[43]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[45]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  G. Lupyan Chapter Seven – What Do Words Do? Toward a Theory of Language-Augmented Thought , 2012 .

[48]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[49]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[50]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[51]  Gary Lupyan,et al.  LABELS FACILITATE LEARNING OF NOVEL CATEGORIES , 2006 .

[52]  Siobhan Chapman Logic and Conversation , 2005 .

[53]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[54]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .