CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

[1]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[2]  Terry Winograd,et al.  Understanding natural language , 1974 .

[3]  Scott E. Fahlman,et al.  A Planning System for Robot Construction Tasks , 1973, Artif. Intell..

[4]  Yvonne M. Caldera,et al.  Children’s Play Preferences, Construction Play with Blocks, and Visual-spatial Skills: Are they Related? , 1999 .

[5]  Constance Kamii,et al.  The Development of Logico-Mathematical Knowledge in a Block-Building Activity at Ages 1–4 , 2004 .

[6]  J. Tenenbaum,et al.  Structure and strength in causal induction , 2005, Cognitive Psychology.

[7]  Juanita V. Copley,et al.  The Development of Spatial Skills Through Interventions Involving Block Building Activities , 2008 .

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Shimon Whiteson,et al.  Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12]  Swiya Nath,et al.  Construction play and cognitive skills associated with the development of mathematical abilities in 7-year-old children , 2014 .

[13]  Thomas E. Hunt,et al.  Children's Construction Task Performance and Spatial Ability: Controlling Task Complexity and Predicting Mathematics Performance , 2014, Perceptual and motor skills.

[14]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[15]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[16]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[17]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[18]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[19]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[20]  Gregory Dudek,et al.  Benchmark Environments for Multitask Learning in Continuous Domains , 2017, ArXiv.

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[23]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[24]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[25]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[26]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[27]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[28]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[29]  Jessica B. Hamrick,et al.  Structured agents for physical construction , 2019, ICML.

[30]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[31]  Thien Huu Nguyen,et al.  BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning , 2018, ICLR.

[32]  Ross B. Girshick,et al.  PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[33]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[34]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[35]  Sergey Levine,et al.  InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.

[36]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[37]  Greg Mori,et al.  COPHY: Counterfactual Learning of Physical Dynamics , 2019, ICLR.

[38]  Andrew Jaegle,et al.  Physically Embedded Planning Problems: New Challenges for Reinforcement Learning , 2020, ArXiv.

[39]  D. Mankowitz,et al.  An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.

[40]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[41]  Tor Lattimore,et al.  Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[42]  Anoop Korattikara Balan,et al.  Measuring the Reliability of Reinforcement Learning Algorithms , 2019, ICLR.

[43]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[44]  Ludovic Righetti,et al.  TriFinger: An Open-Source Robot for Learning Dexterity , 2020, CoRL.

[45]  Judith E. Fan,et al.  Learning to build physical structures better over time , 2020, CogSci.

[46]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[47]  Joseph J. Lim,et al.  IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks , 2019, 2021 IEEE International Conference on Robotics and Automation (ICRA).