OPEn: An Open-ended Physics Environment for Learning Without a Task

Humans have mental models that allow them to plan, experiment, and reason in the physical world. How should an intelligent agent go about learning such models? In this paper, we will study if models of the world learned in an open-ended physics environment, without any specific tasks, can be reused for downstream physics reasoning tasks. To this end, we build a benchmark Open-ended Physics Environment (OPEn) and also design several tasks to test learning representations in this environment explicitly. This setting reflects the conditions in which real agents (i.e. rolling robots) find themselves, where they may be placed in a new kind of environment and must adapt without any teacher to tell them how this environment works. This setting is challenging because it requires solving an exploration problem in addition to a model building and representation learning problem. We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results. However, all models still fall short in sample efficiency when transferring to the downstream tasks. We expect that OPEn will encourage the development of novel rolling robot agents that can build reusable mental models of the world that facilitate many tasks.

[1]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[2]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[3]  Faouzi Ghorbel,et al.  A simple and efficient approach for 3D mesh approximate convex decomposition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[4]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[5]  Joshua B. Tenenbaum,et al.  Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning , 2021, ICLR.

[6]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[7]  Chuang Gan,et al.  CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2020, ICLR.

[8]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[10]  Karl Sims,et al.  Evolving 3D Morphology and Behavior by Competition , 1994, Artificial Life.

[11]  Igor Mordatch,et al.  Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents , 2019, ArXiv.

[12]  Ali Farhadi,et al.  Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[14]  Joshua B. Tenenbaum,et al.  PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics , 2021, ICLR.

[15]  Deva Ramanan,et al.  CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning , 2020, ICLR.

[16]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[18]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[21]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[22]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[23]  Ilya Kostrikov,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[24]  Ross B. Girshick,et al.  PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[25]  Tim Rocktäschel,et al.  RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.

[26]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[27]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[30]  Joshua B. Tenenbaum,et al.  The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark Towards Physically Realistic Embodied AI , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[31]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[32]  Rui Wang,et al.  Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.

[33]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[34]  Niloy J. Mitra,et al.  Neural Re-Simulation for Generating Bounces in Single Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Chuang Gan,et al.  ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation , 2020, ArXiv.

[36]  R. Hadsell,et al.  Robot open-Ended Autonomous Learning competition , 2020 .

[37]  Jürgen Schmidhuber,et al.  Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .

[38]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[40]  Katja Hofmann,et al.  A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games , 2016, ICLR 2016.

[41]  Christian Wolf,et al.  COPHY: Counterfactual Learning of Physical Dynamics , 2020, ICLR.

[42]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[43]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[44]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[45]  Kenneth O. Stanley,et al.  Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[46]  Murray Shanahan,et al.  The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition , 2019, ArXiv.