Learning Shared Dynamics with Meta-World Models

Humans have consciousness as the ability to perceive events and objects: a mental model of the world developed from the most impoverished of visual stimuli, enabling humans to make rapid decisions and take actions. Although spatial and temporal aspects of different scenes are generally diverse, the underlying physics among environments still work the same way, thus learning an abstract description of shared physical dynamics helps human to understand the world. In this paper, we explore building this mental world with neural network models through multi-task learning, namely the meta-world model. We show through extensive experiments that our proposed meta-world models successfully capture the common dynamics over the compact representations of visually different environments from Atari Games. We also demonstrate that agents equipped with our meta-world model possess the ability of visual self-recognition, i.e., recognize themselves from the reflected mirrored environment derived from the classic mirror self-recognition test (MSR).

[1]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[2]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[3]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[4]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[5]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[6]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[7]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[10]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[11]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[12]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[13]  H. Wellman,et al.  Cognitive development: foundational theories of core domains. , 1992, Annual review of psychology.

[14]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[15]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[16]  G. Gallup Chimpanzees: Self-Recognition , 1970, Science.

[17]  Katja Hofmann,et al.  A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games , 2016, ICLR 2016.

[18]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[19]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Chris L. Baker,et al.  Rational quantitative attribution of beliefs, desires and percepts in human mentalizing , 2017, Nature Human Behaviour.

[22]  F. Heider,et al.  An experimental study of apparent behavior , 1944 .

[23]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[24]  Jürgen Schmidhuber,et al.  On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[25]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[26]  G. G. Gallop Chimpanzees: self-recognition. , 1970, Science.

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  A M Turing,et al.  Computing Machinery and Intelligence A.M. Turing , 2007 .

[30]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[31]  Elizabeth S. Spelke,et al.  Principles of Object Perception , 1990, Cogn. Sci..

[32]  S. Dehaene,et al.  What is consciousness, and could machines have it? , 2017, Science.

[33]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[34]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[35]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[36]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[37]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[38]  J. Forrester Counterintuitive behavior of social systems , 1971 .