Learning to Play with Intrinsically-Motivated Self-Aware Agents

Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which an agent can move and interact with objects it sees, we propose a "world-model" network that learns to predict the dynamic consequences of the agent's actions. Simultaneously, we train a separate explicit "self-model" that allows the agent to track the error map of its own world-model, and then uses the self-model to adversarially challenge the developing world-model. We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering. Moreover, the world-model that the agent learns supports improved performance on object dynamics prediction, detection, localization and recognition tasks. Taken together, our results are initial steps toward creating flexible autonomous agents that self-supervise in complex novel physical environments.

[1]  E. N. Solokov Perception and the conditioned reflex , 1963 .

[2]  R. L. Fantz Visual Experience in Infants: Decreased Attention to Familiar Patterns Relative to Novel Ones , 1964, Science.

[3]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[4]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[5]  A. Gopnik,et al.  The scientist in the crib : minds, brains, and how children learn , 1999 .

[6]  R Saxe,et al.  People thinking about thinking people The role of the temporo-parietal junction in “theory of mind” , 2003, NeuroImage.

[7]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[8]  Naftali Tishby,et al.  Query by Committee Made Real , 2005, NIPS.

[9]  Thomas Bräunl,et al.  Evaluation of real-time physics simulation systems , 2007, GRAPHITE '07.

[10]  A. Boeing,et al.  Evaluation of real-time physics simulations systems , 2007 .

[11]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[12]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[13]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[14]  L. Oakes,et al.  The influence of pets on infants' processing of cat and dog images. , 2010, Infant behavior & development.

[15]  Richard N. Aslin,et al.  The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex , 2012, PloS one.

[16]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[17]  Allen Y. Yang,et al.  A Convex Optimization Framework for Active Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  T. Gliga,et al.  Infants Learn What They Want to Learn: Responding to Infant Pointing Leads to Superior Learning , 2014, PloS one.

[21]  Jürgen Leitner,et al.  Curiosity driven reinforcement learning for motion planning on humanoids , 2014, Front. Neurorobot..

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  L. Oakes,et al.  Experience and Distribution of Attention: Pet Exposure and Infants’ Scanning of Animal Images , 2015, Journal of cognition and development : official journal of the Cognitive Development Society.

[24]  S. Kouider,et al.  Infants ask for help when they know they don’t know , 2016, Proceedings of the National Academy of Sciences.

[25]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[26]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[27]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[28]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[29]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[30]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[31]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[32]  Pierre-Yves Oudeyer,et al.  How Evolution May Work Through Curiosity-Driven Developmental Process , 2016, Top. Cogn. Sci..

[33]  Seunghoon Hong,et al.  Weakly Supervised Semantic Segmentation Using Web-Crawled Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Silvio Savarese,et al.  A Geometric Approach to Active Learning for Convolutional Neural Networks , 2017, ArXiv.

[36]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[37]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[39]  Ruimao Zhang,et al.  Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[41]  Kostas E. Bekris,et al.  A self-supervised learning system for object detection using physics simulation and multi-view pose estimation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[43]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[44]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[45]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[46]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[47]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[48]  Daniel L. K. Yamins,et al.  Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.

[49]  Katherine E. Twomey,et al.  Curiosity‐based learning in infants: a neurocomputational approach , 2017, Developmental science.