World Discovery Models

As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information---humans can discover their world. The outstanding ability of the human mind for discovery has led to many breakthroughs in science, art and technology. Here we investigate the possibility of building an agent capable of discovering its world using the modern AI technology. In particular we introduce NDIGO, Neural Differential Information Gain Optimisation, a self-supervised discovery model that aims at seeking new information to construct a global view of its world from partial and noisy observations. Our experiments on some controlled 2-D navigation tasks show that NDIGO outperforms state-of-the-art information-seeking methods in terms of the quality of the learned representation. The improvement in performance is particularly significant in the presence of white or structured noise where other information-seeking methods follow the noise instead of discovering their world.

[1]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[2]  Edward L. Deci,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[3]  M. Csíkszentmihályi,et al.  Optimal experience: Psychological studies of flow in consciousness. , 1988 .

[4]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[5]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[6]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[7]  Merriam Webster Merriam-Webster's Collegiate Dictionary , 2016 .

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[10]  J. Horvitz Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.

[11]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[12]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[13]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[14]  Jordan Litman Curiosity and the pleasures of learning: Wanting and liking new information , 2005 .

[15]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[16]  Pierre-Yves Oudeyer,et al.  How can we define intrinsic motivation , 2008 .

[17]  Pierre-Yves Oudeyer,et al.  How can we define intrinsic motivation , 2008 .

[18]  Jürgen Schmidhuber,et al.  Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes (特集 高次機能の学習と創発--脳・ロボット・人間研究における新たな展開) , 2009 .

[19]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[20]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[21]  Pierre-Yves Oudeyer,et al.  Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[22]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[23]  J. Hohwy The Predictive Mind , 2013 .

[24]  Friedrich T. Sommer,et al.  Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[25]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[26]  Karl J. Friston,et al.  Computational psychiatry: the brain as a phantastic organ. , 2014, The lancet. Psychiatry.

[27]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[28]  Jürgen Leitner,et al.  Curiosity driven reinforcement learning for motion planning on humanoids , 2014, Front. Neurorobot..

[29]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[30]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[31]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[34]  B. Hayden,et al.  The Psychology and Neuroscience of Curiosity , 2015, Neuron.

[35]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[36]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[37]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[38]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[39]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[41]  Pierre-Yves Oudeyer,et al.  How Evolution May Work Through Curiosity-Driven Developmental Process , 2016, Top. Cogn. Sci..

[42]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[43]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[44]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[46]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[47]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[48]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[49]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[50]  Misha Denil,et al.  Learning Awareness Models , 2018, ICLR.

[51]  A. Clark A nice surprise? Predictive processing and the active pursuit of novelty , 2018 .

[52]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[53]  Rémi Munos,et al.  Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.

[54]  Rémi Munos,et al.  Neural Predictive Belief Representations , 2018, ArXiv.

[55]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[56]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[57]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[58]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.