Learning Task-Oriented Grasping From Human Activity Datasets

We propose to leverage a real-world, human activity RGB dataset to teach a robot Task-Oriented Grasping (TOG). We develop a model that takes as input an RGB image and outputs a hand pose and configuration as well as an object pose and a shape. We follow the insight that jointly estimating hand and object poses increases accuracy compared to estimating these quantities independently of each other. Given the trained model, we process an RGB dataset to automatically obtain the data to train a TOG model. This model takes as input an object point cloud and outputs a suitable region for task-specific grasping. Our ablation study shows that training an object pose predictor with the hand pose information (and vice versa) is better than training without this information. Furthermore, our results on a real-world dataset show the applicability and competitiveness of our method over state-of-the-art. Experiments with a robot demonstrate that our method can allow a robot to preform TOG on novel objects.

[1]  Cordelia Schmid,et al.  Learning Joint Reconstruction of Hands and Manipulated Objects , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  James M. Rehg,et al.  In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video , 2018, ECCV.

[3]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[5]  Danica Kragic,et al.  Affordance detection for task-specific grasping using deep learning , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[6]  Danica Kragic,et al.  Non-parametric hand pose estimation with object context , 2013, Image Vis. Comput..

[7]  M. Pollefeys,et al.  Unified Egocentric Recognition of 3 D Hand-Object Poses and Interactions , 2019 .

[8]  Thomas Brox,et al.  Learning to Estimate 3D Hand Pose from Single RGB Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[10]  J. N. Cederberg A Course in Modern Geometries , 1989 .

[11]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[12]  Marc Pollefeys,et al.  H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Danica Kragic,et al.  Global Search with Bernoulli Alternation Kernel for Task-oriented Grasping Informed by Simulation , 2018, CoRL.

[14]  Chad DeChant,et al.  Shape completion enabled robotic grasping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Christian Theobalt,et al.  GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Larry H. Matthies,et al.  Task-oriented grasping with semantic and geometric scene understanding , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Danica Kragic,et al.  Hands in action: real-time 3D reconstruction of hands in interaction with objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[18]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[19]  Shanxin Yuan,et al.  First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Danica Kragic,et al.  Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Matei T. Ciocarlie,et al.  Hand Posture Subspaces for Dexterous Robotic Grasping , 2009, Int. J. Robotics Res..

[22]  Danica Kragic,et al.  Task-Based Robot Grasp Planning Using Probabilistic Inference , 2015, IEEE Transactions on Robotics.

[23]  Ali Borji,et al.  Analysis of Hand Segmentation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Christian Theobalt,et al.  Real-Time Hand Tracking Under Occlusion from an Egocentric RGB-D Sensor , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25]  Ashutosh Saxena,et al.  Learning 3-D object orientation from images , 2009, 2009 IEEE International Conference on Robotics and Automation.

[26]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[27]  Deva Ramanan,et al.  Understanding Everyday Hands in Action from RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  James M. Rehg,et al.  Delving into egocentric actions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[30]  Jianfei Cai,et al.  3D Hand Shape and Pose Estimation from a Single RGB Image (Supplementary Material) , 2019 .

[31]  Tae-Kyun Kim,et al.  Pose Guided RGBD Feature Learning for 3D Object Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).