Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration

Developmental machine learning studies how artificial agents can model the way children learn open-ended repertoires of skills. Such agents need to create and represent goals, select which ones to pursue and learn to achieve them. Recent approaches have considered goal spaces that were either fixed and hand-defined or learned using generative models of states. This limited agents to sample goals within the distribution of known effects. We argue that the ability to imagine out-of-distribution goals is key to enable creative discoveries and open-ended learning. Children do so by leveraging the compositionality of language as a tool to imagine descriptions of outcomes they never experienced before, targeting them as goals during play. We introduce Imagine, an intrinsically motivated deep reinforcement learning architecture that models this ability. Such imaginative agents, like children, benefit from the guidance of a social peer who provides language descriptions. To take advantage of goal imagination, agents must be able to leverage these descriptions to interpret their imagined out-of-distribution goals. This generalization is made possible by modularity: a decomposition between learned goal-achievement reward function and policy relying on deep sets, gated attention and object-centered representations. We introduce the Playground environment and study how this form of goal imagination improves generalization and exploration over agents lacking this capacity. In addition, we identify the properties of goal imagination that enable these results and study the impacts of modularity and social interactions.

[1]  Kenneth O. Stanley,et al.  Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[2]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[3]  Hinrich Schütze,et al.  Extending Machine Language Models toward Human-Level Language Understanding , 2019, ArXiv.

[4]  Pierre-Yves Oudeyer,et al.  A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms , 2019, RML@ICLR.

[5]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[6]  Toben H. Mintz Frequent frames as a cue for grammatical categories in child directed speech , 2003, Cognition.

[7]  Sergey Levine,et al.  Contextual Imagined Goals for Self-Supervised Robotic Learning , 2019, CoRL.

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[10]  Michael P. Kaschak,et al.  Grounding language in action , 2002, Psychonomic bulletin & review.

[11]  Pierre-Yves Oudeyer,et al.  Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[12]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[13]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[14]  Shimon Whiteson,et al.  A Survey of Reinforcement Learning Informed by Natural Language , 2019, IJCAI.

[15]  Peter Ford Dominey,et al.  Real-Time Parallel Processing of Grammatical Structure in the Fronto-Striatal System: A Recurrent Network Simulation Study Using Reservoir Computing , 2013, PloS one.

[16]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[17]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[18]  Stephen Clark,et al.  Emergent Systematic Generalization in a Situated Agent , 2019, ICLR 2020.

[19]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Khanh Nguyen,et al.  Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  A. Gopnik,et al.  The scientist in the crib : minds, brains, and how children learn , 1999 .

[22]  Pierre-Yves Oudeyer,et al.  In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[23]  Sanja Fidler,et al.  ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning , 2019, ArXiv.

[24]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[25]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[26]  Jake Quilty-Dunn,et al.  What Is an Object File? , 2017, The British Journal for the Philosophy of Science.

[27]  M G Pêcheux,et al.  Maternal responsiveness to infants in three societies: the United States, France, and Japan. , 1992, Child development.

[28]  L. Vygotsky,et al.  Tool and symbol in child development , 2008 .

[29]  Scott P. Johnson,et al.  Development of object concepts in infancy: Evidence for early learning in an eye-tracking paradigm , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Jacob Andreas,et al.  Good-Enough Compositional Data Augmentation , 2019, ACL.

[31]  Michael Tomasello,et al.  Twenty-Three-Month-Old Children Have a Grammatical Category of Noun. , 1993 .

[32]  Michael A. Osborne,et al.  On the Limitations of Representing Functions on Sets , 2019, ICML.

[33]  E. Spelke,et al.  Origins of knowledge. , 1992, Psychological review.

[34]  J. Piaget The Language and Thought of the Child , 1927 .

[35]  Pushmeet Kohli,et al.  Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.

[36]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[37]  B. Hayden,et al.  The Psychology and Neuroscience of Curiosity , 2015, Neuron.

[38]  Carina Silberer,et al.  Grounded Models of Semantic Representation , 2012, EMNLP.

[39]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[40]  M. Tomasello The item-based nature of children’s early syntactic development , 2000, Trends in Cognitive Sciences.

[41]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[42]  Olivier Pietquin,et al.  Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following , 2019, ViGIL@NeurIPS.

[43]  Peter Ford Dominey Emergence of grammatical constructions: evidence from simulation and grounded agent experiments , 2005, Connect. Sci..

[44]  A. Cangelosi,et al.  Developmental Robotics: From Babies to Robots , 2015 .

[45]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[46]  Masaki Ogino,et al.  Cognitive Developmental Robotics: A Survey , 2009, IEEE Transactions on Autonomous Mental Development.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[49]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[50]  Peter Ford Dominey,et al.  A cognitive neuroscience perspective on embodied language for human–robot cooperation , 2010, Brain and Language.

[51]  Roozbeh Mottaghi,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  John DeNero,et al.  Guiding Policies with Language via Meta-Learning , 2018, ICLR.

[53]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[54]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[55]  Luc Steels,et al.  Semiotic Dynamics for Embodied Agents , 2006, IEEE Intelligent Systems.

[56]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[57]  Andrew K. Lampinen,et al.  Automated curricula through setter-solver interactions , 2019, ArXiv.

[58]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[59]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[60]  J. Bruner The Narrative Construction of Reality , 1991, Critical Inquiry.

[61]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[62]  Sergey Levine,et al.  From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following , 2019, ICLR.

[63]  A. Alexandrova The British Journal for the Philosophy of Science , 1965, Nature.

[64]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[65]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[66]  Tom Schaul,et al.  Unicorn: Continual Learning with a Universal, Off-policy Agent , 2018, ArXiv.

[67]  Adele E. Goldberg Constructions: a new theoretical approach to language , 2003, Trends in Cognitive Sciences.

[68]  Prasoon Goyal,et al.  Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[69]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[70]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[71]  Pierre-Yves Oudeyer,et al.  Modular active curiosity-driven discovery of tool use , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[72]  Doina Precup,et al.  Self-supervised Learning of Distance Functions for Goal-Conditioned Reinforcement Learning , 2019, ArXiv.

[73]  Chelsea Finn,et al.  Language as an Abstraction for Hierarchical Deep Reinforcement Learning , 2019, NeurIPS.

[74]  Luke S. Zettlemoyer,et al.  Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[75]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[76]  J. Stevenson The cultural origins of human cognition , 2001 .