Active World Model Learning in Agent-rich Environments with Progress Curiosity

World models are self-supervised predictive models of how the world evolves. Humans learn world models by curiously exploring their environment, in the process acquiring compact abstractions of high bandwidth sensory inputs, the ability to plan across long temporal horizons, and an understanding of the behavioral patterns of other agents. In this work, we study how to design such a curiositydriven Active World Model Learning (AWML) system. To do so, we construct a curious agent building world models while visually exploring a 3D physical environment rich with distillations of representative real-world agents. We propose an AWML system driven by -Progress: a scalable and effective learning progress-based curiosity signal and show that -Progress naturally gives rise to an exploration policy that directs attention to complex but learnable dynamics in a balanced manner, as a result overcoming the “white noise problem”. As a result, our -Progress-driven controller achieves significantly higher AWML performance than baseline controllers equipped with state-of-the-art exploration strategies such as Random Network Distillation and Model Disagreement.

[1]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[2]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[3]  Salima Hassas,et al.  A survey on intrinsic motivation in reinforcement learning , 2019, ArXiv.

[4]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[5]  Aimee E. Stahl,et al.  Observing the unexpected enhances infants’ learning and exploration , 2015, Science.

[6]  Martha White,et al.  Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study , 2019, ArXiv.

[7]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[8]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[9]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[10]  Hao Su,et al.  Model Imitation for Model-Based Reinforcement Learning , 2019, ArXiv.

[11]  Scott P. Johnson,et al.  Infants’ perception of chasing , 2013, Cognition.

[12]  Richard N. Aslin,et al.  The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex , 2012, PloS one.

[13]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[15]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[16]  Yoshua Bengio,et al.  Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future , 2019, ArXiv.

[17]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[18]  Deepak Pathak,et al.  Self-Supervised Exploration via Disagreement , 2019, ICML.

[19]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[20]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[21]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[24]  Shunyu Yao,et al.  Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations , 2019, NeurIPS.

[25]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[26]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  D. Olson,et al.  Developing theories of mind , 1988 .

[28]  B. Bower A Child's Theory of Mind , 1993 .

[29]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[30]  Catherine Lord,et al.  The Autism Diagnostic Observation Schedule, Module 4: Revised Algorithm and Standardized Severity Scores , 2014, Journal of autism and developmental disorders.

[31]  Philip S. Yu,et al.  PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning , 2018, ICML.

[32]  Daniel L. K. Yamins,et al.  Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.

[33]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[34]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[35]  Jennifer C. Sarrett,et al.  Commentary: Attention to Eyes Is Present but in Decline in 2–6-Month-Old Infants Later Diagnosed with Autism , 2015, Front. Public Health.

[36]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Z. Nadasdy,et al.  Taking the intentional stance at 12 months of age , 1995, Cognition.

[38]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[39]  M. Tomasello,et al.  Does the chimpanzee have a theory of mind? 30 years later , 2008, Trends in Cognitive Sciences.

[40]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[41]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[42]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[43]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[44]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.