Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter

When operating in unstructured environments such as warehouses, homes, and retail centers, robots are frequently required to interactively search for and retrieve specific objects from cluttered bins, shelves, or tables. Mechanical Search describes the class of tasks where the goal is to locate and extract a known target object. In this paper, we formalize Mechanical Search and study a version where distractor objects are heaped over the target object in a bin. The robot uses an RGBD perception system and control policies to iteratively select, parameterize, and perform one of 3 actions – push, suction, grasp – until the target object is extracted, or either a time limit is exceeded, or no high confidence push or grasp is available. We present a study of 5 algorithmic policies for mechanical search, with 15,000 simulated trials and 300 physical trials for heaps ranging from 10 to 20 objects. Results suggest that success can be achieved in this long-horizon task with algorithmic policies in over 95% of instances and that the number of actions required scales approximately linearly with the size of the heap. Code and supplementary material can be found at http://ai.stanford.edu/mech-search.

[1]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[2]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[3]  Gregory D. Hager,et al.  Do what i want, not what i did: Imitation of skills by planning sequences of actions , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  John Folkesson,et al.  Search in the real world: Active visual object search based on spatial relations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  Sergey Levine,et al.  End-to-End Learning of Semantic Grasping , 2017, CoRL.

[7]  Nicholas J. Butko,et al.  Active perception , 2010 .

[8]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[9]  Kate Saenko,et al.  High precision grasp pose detection in dense clutter , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[11]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[12]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[15]  Dieter Fox,et al.  Interactive singulation of objects from a pile , 2012, 2012 IEEE International Conference on Robotics and Automation.

[16]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[17]  Pieter Abbeel,et al.  Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Bart Selman,et al.  Learning Sequences of Controllers for Complex Manipulation Tasks , 2013, ArXiv.

[19]  Oliver Brock,et al.  Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  J. Andrew Bagnell,et al.  Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[22]  Gaurav S. Sukhatme,et al.  Interactive environment exploration in clutter , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Kenneth Y. Goldberg,et al.  Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences , 2017, CoRL.

[24]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2020, Int. J. Robotics Res..

[25]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[26]  James M. Rehg,et al.  Guided pushing for object singulation , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[28]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[29]  Alexander S. Ecker,et al.  One-Shot Segmentation in Clutter , 2018, ICML.

[30]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[31]  Ken Goldberg,et al.  Segmenting Unknown 3D Objects from Real Depth Images using Mask R-CNN Trained on Synthetic Data , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[32]  Kenneth Y. Goldberg,et al.  Linear Push Policies to Increase Grasp Access for Robot Bin Picking , 2018, 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE).

[33]  Stuart J. Russell,et al.  Combined Task and Motion Planning for Mobile Manipulation , 2010, ICAPS.

[34]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[35]  Kenneth Y. Goldberg,et al.  Segmenting Unknown 3D Objects from Real Depth Images using Mask R-CNN Trained on Synthetic Point Clouds , 2018, ArXiv.

[36]  Sanja Fidler,et al.  3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[38]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[39]  Jana Kosecka,et al.  Visual Representations for Semantic Target Driven Navigation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[40]  Lawson L. S. Wong,et al.  Learning Grasp Strategies with Partial Shape Information , 2008, AAAI.

[41]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Mark Moll,et al.  Randomized Physics-Based Motion Planning for Grasping in Cluttered and Uncertain Environments , 2017, IEEE Robotics and Automation Letters.

[43]  T. Albright Perceiving , 2015, Daedalus.

[44]  Xinyu Liu,et al.  Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning , 2017, ArXiv.

[45]  Robert Platt,et al.  Using Geometry to Detect Grasp Poses in 3D Point Clouds , 2015, ISRR.

[46]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[47]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[48]  Siddhartha S. Srinivasa,et al.  Grasp synthesis in cluttered environments for dexterous hands , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[49]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[51]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[53]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[54]  Peter Corke,et al.  Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.