CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Tor Lattimore,et al.  Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[3]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[4]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[5]  Ross B. Girshick,et al.  PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[6]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[7]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8]  J. Tenenbaum,et al.  Structure and strength in causal induction , 2005, Cognitive Psychology.

[9]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[10]  Judith E. Fan,et al.  Learning to build physical structures better over time , 2020, CogSci.

[11]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[12]  John F. Canny,et al.  Measuring the Reliability of Reinforcement Learning Algorithms , 2019, ICLR.

[13]  Scott E. Fahlman,et al.  A Planning System for Robot Construction Tasks , 1973, Artif. Intell..

[14]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[15]  Jessica B. Hamrick,et al.  Structured agents for physical construction , 2019, ICML.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Nir Levine,et al.  An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.

[18]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[21]  Christian Wolf,et al.  COPHY: Counterfactual Learning of Physical Dynamics , 2020, ICLR.

[22]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[23]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[24]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[25]  Constance Kamii,et al.  The Development of Logico-Mathematical Knowledge in a Block-Building Activity at Ages 1–4 , 2004 .

[26]  Swiya Nath,et al.  Construction play and cognitive skills associated with the development of mathematical abilities in 7-year-old children , 2014 .

[27]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[28]  Juanita V. Copley,et al.  The Development of Spatial Skills Through Interventions Involving Block Building Activities , 2008 .

[29]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[30]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[31]  Shimon Whiteson,et al.  Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[32]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[33]  Andrew Jaegle,et al.  Physically Embedded Planning Problems: New Challenges for Reinforcement Learning , 2020, ArXiv.

[34]  Thomas E. Hunt,et al.  Children's Construction Task Performance and Spatial Ability: Controlling Task Complexity and Predicting Mathematics Performance , 2014, Perceptual and motor skills.

[35]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[36]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[37]  Sergey Levine,et al.  InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.

[38]  Ludovic Righetti,et al.  TriFinger: An Open-Source Robot for Learning Dexterity , 2020, CoRL.

[39]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[40]  Bernhard Schölkopf,et al.  Recurrent Independent Mechanisms , 2021, ICLR.

[41]  Terry Winograd,et al.  Understanding natural language , 1974 .

[42]  Joseph J. Lim,et al.  IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks , 2019, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[44]  Gregory Dudek,et al.  Benchmark Environments for Multitask Learning in Continuous Domains , 2017, ArXiv.

[45]  Yoshua Bengio,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[46]  Yvonne M. Caldera,et al.  Children’s Play Preferences, Construction Play with Blocks, and Visual-spatial Skills: Are they Related? , 1999 .

[47]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.