Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter

When searching for objects in cluttered environments, it is often necessary to perform complex interactions in order to move occluding objects out of the way and fully reveal the object of interest and make it graspable. Due to the complexity of the physics involved and the lack of accurate models of the clutter, planning and controlling precise predefined interactions with accurate outcome is extremely hard, when not impossible. In problems where accurate (forward) models are lacking, Deep Reinforcement Learning (RL) has shown to be a viable solution to map observations (e.g. images) to good interactions in the form of close-loop visuomotor policies. However, Deep RL is sample inefficient and fails when applied directly to the problem of unoccluding objects based on images. In this work we present a novel Deep RL procedure that combines i) teacher-aided exploration, ii) a critic with privileged information, and iii) mid-level representations, resulting in sample efficient and effective learning for the problem of uncovering a target object occluded by a heap of unknown objects. Our experiments show that our approach trains faster and converges to more efficient uncovering solutions than baselines and ablations, and that our uncovering policies lead to an average improvement in the graspability of the target object, facilitating downstream retrieval applications.

[1]  Jitendra Malik,et al.  Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies , 2018 .

[2]  Silvio Savarese,et al.  Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[3]  Wolfram Burgard,et al.  Learning to Singulate Objects using a Push Proposal Network , 2017, ISRR.

[4]  Siddhartha S. Srinivasa,et al.  Grasp synthesis in cluttered environments for dexterous hands , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[5]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Yang Yang,et al.  A Deep Learning Approach to Grasping the Invisible , 2020, IEEE Robotics and Automation Letters.

[7]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[8]  Christopher Amato,et al.  Online Planning for Target Object Search in Clutter under Partial Observability , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[9]  Silvio Savarese,et al.  AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers , 2019, CoRL.

[10]  Oliver Kroemer,et al.  Maximally informative interaction learning for scene exploration , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Ken Goldberg,et al.  On-Policy Dataset Synthesis for Learning Robot Grasping Policies Using Fully Convolutional Deep Networks , 2019, IEEE Robotics and Automation Letters.

[12]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[13]  Torsten Kröger,et al.  Robot Learning of Shifting Objects for Grasping in Cluttered Environments , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  James M. Rehg,et al.  Guided pushing for object singulation , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Gaurav S. Sukhatme,et al.  Using Manipulation Primitives for Object Sorting in Cluttered Environments , 2015, IEEE Transactions on Automation Science and Engineering.

[17]  Pieter Abbeel,et al.  Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Oliver Brock,et al.  An integrated approach to visual perception of articulated objects , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[20]  David Filliat,et al.  Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics , 2018, ArXiv.

[21]  Il Hong Suh,et al.  Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image , 2020, ArXiv.

[22]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[23]  Oliver Brock,et al.  Interactive segmentation for manipulation in unstructured environments , 2009, 2009 IEEE International Conference on Robotics and Automation.

[24]  Kenneth Y. Goldberg,et al.  Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences , 2017, CoRL.

[25]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Mark Moll,et al.  Randomized Physics-Based Motion Planning for Grasping in Cluttered and Uncertain Environments , 2017, IEEE Robotics and Automation Letters.

[28]  T. Albright Perceiving , 2015, Daedalus.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  Peter Henderson,et al.  Bayesian Policy Gradients via Alpha Divergence Dropout Inference , 2017, ArXiv.

[31]  Oliver Brock,et al.  Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[32]  J. Andrew Bagnell,et al.  Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[33]  Roland Siegwart,et al.  Object Finding in Cluttered Scenes Using Interactive Perception , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Silvio Savarese,et al.  Gibson Env V2: Embodied Simulation Environments for Interactive Navigation , 2019 .

[35]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[36]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[37]  Sergey Levine,et al.  End-to-End Learning of Semantic Grasping , 2017, CoRL.