Attention Based Visual Analysis for Fast Grasp Planning With a Multi-Fingered Robotic Hand

We present an attention based visual analysis framework to compute grasp-relevant information which helps to guide grasp planning using a multi-fingered robotic hand. Our approach uses a computational visual attention model to locate regions of interest in a scene and employ a deep convolutional neural network to detect grasp type and grasp attention point for a sub-region of the object in a region of interest. We demonstrate the proposed framework with object grasping tasks, in which the information generated from the proposed framework is used as prior information to guide grasp planning. The effectiveness of the proposed approach is evaluated in both simulation experiments and real-world experiments. Experimental results show that the proposed framework can not only speed up grasp planning with more stable configurations, but also handle unknown objects. Furthermore, our framework can handle cluttered scenarios. A new Grasp Type Dataset (GTD) which includes six commonly used grasp types and covers 12 household objects is also presented.

[1]  Simone Frintrop,et al.  Traditional saliency reloaded: A good old model in new shape , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shuai Li,et al.  A New Varying-Parameter Convergent-Differential Neural-Network for Solving Time-Varying Convex QP Problem Constrained by Linear-Equality , 2018, IEEE Transactions on Automatic Control.

[3]  Matei T. Ciocarlie,et al.  Hand Posture Subspaces for Dexterous Robotic Grasping , 2009, Int. J. Robotics Res..

[4]  Danica Kragic,et al.  The GRASP Taxonomy of Human Grasp Types , 2016, IEEE Transactions on Human-Machine Systems.

[5]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Siddhartha S. Srinivasa,et al.  Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set , 2015, IEEE Robotics & Automation Magazine.

[8]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  J. Theeuwes,et al.  Top-down versus bottom-up attentional control: a failed theoretical dichotomy , 2012, Trends in Cognitive Sciences.

[10]  J. Fischer,et al.  The Prehensile Movements of the Human Hand , 2014 .

[11]  Máximo A. Roa,et al.  Grasp quality measures: review and performance , 2014, Autonomous Robots.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[14]  T. Duckett,et al.  VOCUS : A Visual Attention System for Object Detection and Goal-directed Search , 2010 .

[15]  Tamim Asfour,et al.  Planning High-Quality Grasps Using Mean Curvature Object Skeletons , 2018, IEEE Robotics and Automation Letters.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[18]  Yoichi Sato,et al.  A scalable approach for understanding the visual structures of hand grasps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[20]  Matei T. Ciocarlie,et al.  Contact-reactive grasping of objects with partial shape information , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Raúl Suárez Feijóo,et al.  Grasp quality measures , 2006 .

[22]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[23]  Peter K. Allen,et al.  Semantic grasping: planning task-specific stable robotic grasps , 2014, Auton. Robots.

[24]  Aaron M. Dollar,et al.  The Yale human grasping dataset: Grasp, object, and task data in household and machine shop environments , 2015, Int. J. Robotics Res..

[25]  Xin Wang,et al.  On quality functions for grasp synthesis, fixture planning, and coordinated manipulation , 2004, IEEE Transactions on Automation Science and Engineering.

[26]  Hamid Laga,et al.  Geometry and context for semantic correspondences and functionality recognition in man-made 3D shapes , 2013, TOGS.

[27]  Stefano Caselli,et al.  A 3D shape segmentation approach for robot grasping by parts , 2012, Robotics Auton. Syst..

[28]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[29]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Sven Behnke,et al.  NimbRo picking: Versatile part handling for warehouse automation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Fumio Kanehiro,et al.  Fast grasp planning for hand/arm systems based on convex model , 2008, 2008 IEEE International Conference on Robotics and Automation.

[33]  J. Theeuwes Top-down and bottom-up control of visual selection. , 2010, Acta psychologica.

[34]  Yoichi Sato,et al.  An Ego-Vision System for Hand Grasp Analysis , 2017, IEEE Transactions on Human-Machine Systems.

[35]  Deva Ramanan,et al.  Understanding Everyday Hands in Action from RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  José Santos-Victor,et al.  Recognizing the grasp intention from human demonstration , 2015, Robotics Auton. Syst..

[37]  Medhat A. Moussa,et al.  An Integrated Simulator and Dataset that Combines Grasping and Vision for Deep Learning , 2017, ArXiv.

[38]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ales Leonardis,et al.  One-shot learning and generation of dexterous grasps for novel objects , 2016, Int. J. Robotics Res..

[40]  Oliver Kroemer,et al.  Predicting object interactions from contact distributions , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Alexander Herzog,et al.  Learning of grasp selection based on shape-templates , 2014, Auton. Robots.