Task-oriented grasping with semantic and geometric scene understanding

We present a task-oriented grasp model, that encodes grasps that are configurationally compatible with a given task. For instance, if the task is to pour liquid from a container, the model encodes grasps that leave the opening of the container unobstructed. The model consists of two independent agents: First, a geometric grasp model that computes, from a depth image, a distribution of 6D grasp poses for which the shape of the gripper matches the shape of the underlying surface. The model relies on a dictionary of geometric object parts annotated with workable gripper poses and preshape parameters. It is learned from experience via kinesthetic teaching. The second agent is a CNN-based semantic model that identifies grasp-suitable regions in a depth image: regions where a grasp will not impede the execution of the task. The semantic model allows us to encode relationships such as “grasp from the handle.” A key element of this work is to use a deep network to integrate contextual task cues, and defer the structured-output problem of gripper pose computation to an explicit (learned) geometric model. Jointly, these two models generate grasps that are mechanically fit, and that grip on the object in a way that enables the intended task.

[1]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[2]  Michael A. Arbib,et al.  Modeling parietal-premotor interactions in primate control of grasping , 1998, Neural Networks.

[3]  Sarah H. Creem,et al.  Grasping objects by their handles: a necessary interaction between cognition and action. , 2001, Journal of experimental psychology. Human perception and performance.

[4]  G. Rizzolatti,et al.  The Cortical Motor System , 2001, Neuron.

[5]  Alexander Stoytchev,et al.  Behavior-Grounded Representation of Tool Affordances , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[6]  Maya Cakmak,et al.  To Afford or Not to Afford: A New Formalization of Affordances Toward Affordance-Based Robot Control , 2007, Adapt. Behav..

[7]  Lawson L. S. Wong,et al.  Learning Grasp Strategies with Partial Shape Information , 2008, AAAI.

[8]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[9]  M. Goodale,et al.  Two visual systems re-viewed , 2008, Neuropsychologia.

[10]  Justus H. Piater,et al.  Continuous Surface-Point Distributions for 3D Object Pose Estimation and Recognition , 2010, ACCV.

[11]  Danica Kragic,et al.  Multivariate discretization for Bayesian Network structure learning in robot grasping , 2011, 2011 IEEE International Conference on Robotics and Automation.

[12]  Siddhartha S. Srinivasa,et al.  Task Space Regions , 2011, Int. J. Robotics Res..

[13]  Andreas Uhl,et al.  BlenSor: Blender Sensor Simulation Toolbox , 2011, ISVC.

[14]  Trevor Darrell,et al.  Visual grasp affordances from appearance-based cues , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[15]  Peter K. Allen,et al.  Semantic grasping: Planning robotic grasps functionally suitable for an object manipulation task , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Markus Vincze,et al.  AfRob: The affordance network ontology for robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Oliver Kroemer,et al.  A kernel-based approach to direct action perception , 2012, 2012 IEEE International Conference on Robotics and Automation.

[18]  Danica Kragic,et al.  Learning a dictionary of prototypical grasp-predicting parts from grasping experience , 2013, 2013 IEEE International Conference on Robotics and Automation.

[19]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[20]  Luc De Raedt,et al.  High-level Reasoning and Low-level Learning for Grasping: A Probabilistic Logic Pipeline , 2014, ArXiv.

[21]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[22]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Markus Schoeler,et al.  Semantic Pose Using Deep Networks Trained on Synthetic RGB-D , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[27]  Danica Kragic,et al.  Learning Human Priors for Task-Constrained Grasping , 2015, ICVS.

[28]  Eren Erdal Aksoy,et al.  Part-based grasp planning for familiar objects , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[29]  Colas Schretter,et al.  Monte Carlo and Quasi-Monte Carlo Methods , 2016 .

[30]  Nikolaos G. Tsagarakis,et al.  Detecting object affordances with Convolutional Neural Networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Juergen Gall,et al.  Weakly Supervised Learning of Affordances , 2016, ArXiv.

[32]  Sinisa Todorovic,et al.  A Multi-scale CNN for Affordance Segmentation in RGB Images , 2016, ECCV.

[33]  Ales Leonardis,et al.  Task-relevant grasp selection: A joint solution to planning grasps and manipulative motion trajectories , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).