Rearrangement: A Challenge for Embodied AI

We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specified by object poses, by images, by a description in language, or by letting the agent experience the environment in the goal state. We characterize rearrangement scenarios along different axes and describe metrics for benchmarking rearrangement performance. To facilitate research and exploration, we present experimental testbeds of rearrangement scenarios in four different simulation environments. We anticipate that other datasets will be released and new simulation platforms will be built to support training of rearrangement agents and their deployment on physical systems.

[1]  H. Simon,et al.  Computer simulation of human thinking and problem solving. , 1962, Monographs of the Society for Research in Child Development.

[2]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[3]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[4]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[5]  E. Rivlin,et al.  Practical pushing planning for rearrangement tasks , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[6]  Hiroaki Kitano,et al.  The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[7]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[8]  Drew McDermott,et al.  The 1998 AI Planning Systems Competition , 2000, AI Mag..

[9]  A. Needham,et al.  A pick-me-up for infants’ exploratory skills: Early simulated experiences reaching for objects using ‘sticky mittens’ enhances young infants’ object exploration skills , 2002 .

[10]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[11]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[12]  M. Shah,et al.  Object tracking: A survey , 2006, CSUR.

[13]  Tamim Asfour,et al.  Manipulation Planning Among Movable Obstacles , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[14]  Francisco Bonin-Font,et al.  Visual Navigation for Mobile Robots: A Survey , 2008, J. Intell. Robotic Syst..

[15]  E. Adelson,et al.  Retrographic sensing for the measurement of surface texture and shape , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Mike Stilman,et al.  Combining motion planning and optimization for flexible robot manipulation , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[17]  Aaron M. Dollar,et al.  Benchmarking grasping and manipulation: Properties of the Objects of Daily Living , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Heinrich M. Jaeger,et al.  Universal robotic gripper based on the jamming of granular material , 2010, Proceedings of the National Academy of Sciences.

[19]  Stefan Ulbrich,et al.  The OpenGRASP benchmarking suite: An environment for the comparative analysis of grasping and dexterous manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[21]  Akansel Cosgun,et al.  Push planning for object placement on cluttered table surfaces , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[23]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[25]  D. Holz,et al.  RoboCup@Home: Demonstrating Everyday Manipulation Skills in RoboCup@Home , 2012, IEEE Robotics & Automation Magazine.

[26]  Siddhartha S. Srinivasa,et al.  Object search by manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[27]  Blai Bonet,et al.  A Concise Introduction to Models and Methods for Automated Planning , 2013, A Concise Introduction to Models and Methods for Automated Planning.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[29]  Kostas E. Bekris,et al.  Rearranging similar objects with a manipulator using pebble graphs , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[30]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Uriel Martinez-Hernandez Tactile Sensors , 2015, Scholarpedia.

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[36]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[37]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[38]  Mohammad Biglarbegian,et al.  State of the Art Robotic Grippers and Applications , 2016, Robotics.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  R. Mccall,et al.  The Genetic and Environmental Origins of Learning Abilities and Disabilities in the Early School , 2007, Monographs of the Society for Research in Child Development.

[41]  Paolo Traverso,et al.  Automated Planning and Acting , 2016 .

[42]  Siddhartha S. Srinivasa,et al.  Rearrangement planning using object-centric and robot-centric action spaces , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Martijn Wisse,et al.  Team Delft's Robot Winner of the Amazon Picking Challenge 2016 , 2016, RoboCup.

[44]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[45]  Thomas A. Funkhouser,et al.  MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments , 2017, ArXiv.

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Edward H. Adelson,et al.  GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force , 2017, Sensors.

[48]  Siddhartha S. Srinivasa,et al.  Unobservable Monte Carlo planning for nonprehensile rearrangement tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Adolfo Rodríguez Tsouroukdissian,et al.  ros_control: A generic and simple control framework for ROS , 2017, J. Open Source Softw..

[50]  P. Abbeel,et al.  Yale-CMU-Berkeley dataset for robotic manipulation research , 2017, Int. J. Robotics Res..

[51]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[52]  Benjamin Kuipers,et al.  Shakey: From Conception to History , 2017, AI Mag..

[53]  D. Donoho 50 Years of Data Science , 2017 .

[54]  Silvio Savarese,et al.  SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.

[55]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[56]  Xue-Xin Wei,et al.  Emergence of grid-like representations by training recurrent neural networks to perform spatial localization , 2018, ICLR.

[57]  Razvan Pascanu,et al.  Vector-based navigation using grid-like representations in artificial agents , 2018, Nature.

[58]  Andrew J. Davison,et al.  FutureMapping: The Computational Structure of Spatial AI Systems , 2018, ArXiv.

[59]  Oliver Brock,et al.  Guest Editorial Open Discussion of Robot Grasping Benchmarks, Protocols, and Metrics , 2018, IEEE Trans Autom. Sci. Eng..

[60]  M. T. Mason,et al.  Toward Robotic Manipulation , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[61]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[63]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[64]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[65]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Silvio Savarese,et al.  Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[67]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[68]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[69]  Vladlen Koltun,et al.  Does computer vision matter for action? , 2019, Science Robotics.

[70]  Alexander Toshev,et al.  ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects , 2020, ArXiv.

[71]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[72]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[73]  Leslie Pack Kaelbling,et al.  Integrated Task and Motion Planning , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[74]  Erez Karpas,et al.  Automated Planning for Robotics , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[75]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[76]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[77]  Jacob Andreas,et al.  Experience Grounds Language , 2020, EMNLP.

[78]  Andrew J. Davison,et al.  MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Kostas E. Bekris,et al.  Synchronized Multi-Arm Rearrangement Guided by Mode Graphs with Capacity Constraints , 2020, WAFR.