World Models

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is available at https://worldmodels.github.io/

[1]  A. Einstein,et al.  Die Grundlage der allgemeinen Relativitätstheorie , 1916 .

[2]  J. Oort The force exerted by the stellar system in the direction perpendicular to the galactic plane and some related problems , 1932 .

[3]  E. A. Milne,et al.  Newtonian Universes and the Curvature of Space , 1934 .

[4]  F. Zwicky On the Masses of Nebulae and of Clusters of Nebulae , 1937 .

[5]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[6]  E. Dewan,et al.  Note on Stress Effects due to Relativistic Contraction , 1959 .

[7]  W. Mattig Über den Zusammenhang zwischen der Anzahl der extragalaktischen Objekte und der scheinbaren Helligkeit (Mitteilungen des Astrophysikalischen Observatoriums Potsdam Nr. 71) , 1959 .

[8]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[9]  A. Schild Equivalence Principle and Red-Shift Measurements , 1960 .

[10]  F. Tangherlini Postulational approach to schwarzschild’s exterior solution with application to a class of interior solutions , 1962 .

[11]  I. Shapiro Fourth Test of General Relativity , 1964 .

[12]  W. Rindler Counterexample to the Lenz-Schiff Argument , 1968 .

[13]  W. M. Sacks,et al.  Simple Derivations of the Schwarzschild Metric , 1968 .

[14]  W. Rindler Counterexample to the Tangherlini argument. , 1969 .

[15]  V. Rubin,et al.  Rotation of the Andromeda Nebula from a Spectroscopic Survey of Emission Regions , 1970 .

[16]  J. Forrester Counterintuitive behavior of social systems , 1971 .

[17]  W. V. Loscutoff,et al.  General sensitivity theory , 1972 .

[18]  D. Meadows,et al.  The Limits to Growth , 2018, Green Planet Blues.

[19]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[20]  W. Nordhaus World Dynamics: Measurement Without Data , 1973 .

[21]  D. Meadows Dynamics of Growth in a Finite World , 1974 .

[22]  P. Vermeulen,et al.  Parameter sensitivity of the ‘Limits to Growth’ world model , 1976 .

[23]  O. Gron Acceleration and weight of extended bodies in the theory of relativity. [Doppler effect relation] , 1977 .

[24]  J. Hoogh,et al.  Food for a growing world population , 1977 .

[25]  P. J. Vermeulen,et al.  Growth in a finite world - a comprehensive sensitivity analysis , 1977, Autom..

[26]  G. Bruckmann A pre-evaluation of MOIRA , 1977 .

[27]  M. Friedman Nobel Lecture: Inflation and Unemployment , 1977, Journal of Political Economy.

[28]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[29]  Ronald P. Gruber,et al.  The impossibility of a simple derivation of the Schwarzschild metric , 1988 .

[30]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[31]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[32]  Frank Fallside,et al.  Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[33]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[34]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[35]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[36]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[37]  D. Robertson,et al.  New measurement of solar gravitational deflection of radio signals using VLBI , 1991, Nature.

[38]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[39]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[40]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[41]  A. D. Manning,et al.  Understanding Comics: The Invisible Art , 1993 .

[42]  Robert M. French,et al.  Catastrophic Interference in Connectionist Networks: Can It Be Predicted, Can It Be Prevented? , 1993, NIPS.

[43]  S. Srihari Mixture Density Networks , 1994 .

[44]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[45]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[46]  Talal A. Debs,et al.  The twin ‘‘paradox’’ and the conventionality of simultaneity , 1996 .

[47]  P. Rowlands A simple approach to the experimental consequences of general relativity , 1997 .

[48]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[49]  S. Carroll The Cosmological Constant , 2000, Living reviews in relativity.

[50]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[51]  P. Salucci,et al.  The Intriguing Distribution of Dark Matter in Galaxies , 2002, astro-ph/0203457.

[52]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[53]  Jürgen Schmidhuber,et al.  Optimal Ordered Problem Solver , 2002, Machine Learning.

[54]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[55]  G. Bertone,et al.  Particle dark matter: Evidence, candidates and constraints , 2004, hep-ph/0404175.

[56]  Jürgen Schmidhuber,et al.  Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[57]  M. Sereno,et al.  Dark matter vs . modifications of the gravitational inverse-square law . Results from planetary motion in the solar system , 2006 .

[58]  Jürgen Schmidhuber,et al.  Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .

[59]  L. Iorio Solar system planetary orbital motions and dark matter , 2006, gr-qc/0602095.

[60]  J. Hartle General relativity in the undergraduate physics curriculum , 2005, gr-qc/0506075.

[61]  M. Hobson,et al.  General Relativity: An Introduction for Physicists , 2006 .

[62]  C. Boehmer,et al.  The generalized virial theorem in f(R) gravity , 2007, 0710.0966.

[63]  C. Boehmer,et al.  On Einstein clusters as galactic dark matter haloes , 2007, 0705.1756.

[64]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[65]  C. Boehmer,et al.  Solar system tests of brane world models , 2008, 0801.1375.

[66]  Risto Miikkulainen,et al.  Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..

[67]  J. Frère,et al.  Bound on the dark matter density in the Solar System from planetary motions , 2007, astro-ph/0701542.

[68]  S. Adler Solar System Dark Matter , 2009, 0903.4879.

[69]  R. Catena,et al.  A novel determination of the local dark matter density , 2009, 0907.0018.

[70]  Kenneth O. Stanley,et al.  Autonomous Evolution of Topographic Regularities in Artificial Neural Networks , 2010, Neural Computation.

[71]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[72]  C. Boehmer,et al.  Classical tests of general relativity in brane world models , 2009, 0910.3800.

[73]  L. Iorio Effect of Sun and Planet-Bound Dark Matter on Planet and Satellite Dynamics in the Solar System , 2010, 1001.1697.

[74]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[75]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[76]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[77]  R. R. Cuzinatto,et al.  Schwarzschild and de Sitter solutions from the argument by Lenz and Sommerfeld , 2010, 1009.3249.

[78]  K. Kassner Spatial geometry of the rotating disk and its non-rotating counterpart , 2011, 1109.2488.

[79]  Edward J. Wollack,et al.  FIVE-YEAR WILKINSON MICROWAVE ANISOTROPY PROBE OBSERVATIONS: COSMOLOGICAL INTERPRETATION , 2008, 0803.0547.

[80]  A. Fienga,et al.  The INPOP10a planetary ephemeris and its applications in fundamental physics , 2011 .

[81]  Bobby D. Bryant,et al.  Neurovisual Control in the Quake II Environment , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[82]  Georg B. Keller,et al.  Sensorimotor Mismatch Signals in Primary Visual Cortex of the Behaving Mouse , 2012, Neuron.

[83]  S. Tremaine,et al.  ON THE LOCAL DARK MATTER DENSITY , 2012, 1205.4033.

[84]  Nelson Christensen,et al.  Teaching general relativity to undergraduates , 2012 .

[85]  C. M. Bidin,et al.  KINEMATICAL AND CHEMICAL VERTICAL STRUCTURE OF THE GALACTIC THICK DISK. II. A LACK OF DARK MATTER IN THE SOLAR NEIGHBORHOOD, , 2012, 1204.3924.

[86]  S. Garbari,et al.  A new determination of the local dark matter density from the kinematics of K dwarfs , 2012, 1206.0015.

[87]  T. Harko,et al.  Could pressureless dark matter have pressure , 2011, 1104.2674.

[88]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[89]  Joshua I. Sanders,et al.  Cortical interneurons that specialize in disinhibitory control , 2013, Nature.

[90]  Jürgen Schmidhuber,et al.  Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[91]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[92]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[93]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[94]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[95]  David Whitney,et al.  Motion-Dependent Representation of Space in Area MT+ , 2013, Neuron.

[96]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[97]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[98]  Thomas B. Schön,et al.  Learning deep dynamical models from image pixels , 2014, ArXiv.

[99]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[100]  P. König,et al.  Primary Visual Cortex Represents the Difference Between Past and Present , 2013, Cerebral cortex.

[101]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[102]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[103]  D. Mobbs,et al.  The ecology of human fear: survival optimization and the nervous system , 2015, Front. Neurosci..

[104]  T. Sejnowski,et al.  Nanoconnectomic upper bound on the variability of synaptic plasticity , 2015, eLife.

[105]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[106]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[107]  Jürgen Schmidhuber,et al.  On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[108]  Carl E. Rasmussen,et al.  Data-Efficient Reinforcement Learning in Continuous-State POMDPs , 2016, ArXiv.

[109]  C. Rasmussen,et al.  Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[110]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[111]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[112]  Sophia Blau,et al.  Numerical Optimization Of Computer Models , 2016 .

[113]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[114]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[115]  Joseph Suarez,et al.  Language Modeling with Recurrent Highway Hypernetworks , 2017, NIPS.

[116]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[117]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[118]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[119]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.

[120]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[121]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[122]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[123]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[124]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[125]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[126]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[127]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[128]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[129]  Julian Togelius,et al.  Autoencoder-augmented neuroevolution for visual doom playing , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[130]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[131]  Georg B. Keller,et al.  A Sensorimotor Circuit in Mouse Cortex for Visual Flow Predictions , 2017, Neuron.

[132]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[133]  David Amos,et al.  Generative Temporal Models with Memory , 2017, ArXiv.

[134]  Doris Y. Tsao,et al.  The Code for Facial Identity in the Primate Brain , 2017, Cell.

[135]  David J. Foster Replay Comes of Age. , 2017, Annual review of neuroscience.

[136]  Thomas A. Runkler,et al.  A benchmark environment motivated by industrial control problems , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[137]  Boyang Li,et al.  Game Engine Learning from Video , 2017, IJCAI.

[138]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[139]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[140]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[141]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[142]  Jürgen Schmidhuber,et al.  One Big Net For Everything , 2018, ArXiv.

[143]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[144]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[145]  J. Schmidhuber Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments , 1990, Forschungsberichte, TU Munich.

[146]  A. Kitaoka,et al.  Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction , 2018, Front. Psychol..

[147]  E. L. Harder,et al.  The Institute of Electrical and Electronics Engineers, Inc. , 2019, 2019 IEEE International Conference on Software Architecture Companion (ICSA-C).