Deep learning for detecting robotic grasps

We consider the problem of detecting robotic grasps in an RGB-D view of a scene containing objects. In this work, we apply a deep learning approach to solve this problem, which avoids time-consuming hand-design of features. This presents two main challenges. First, we need to evaluate a huge number of candidate grasps. In order to make detection fast and robust, we present a two-step cascaded system with two deep networks, where the top detections from the first are re-evaluated by the second. The first network has fewer features, is faster to run, and can effectively prune out unlikely candidate grasps. The second, with more features, is slower but has to run only on the top few detections. Second, we need to handle multimodal inputs effectively, for which we present a method that applies structured regularization on the weights based on multimodal group regularization. We show that our method improves performance on an RGBD robotic grasping dataset, and can be used to successfully execute grasps on two different robotic platforms.

[1]  B. Dizioglu,et al.  Mechanics of form closure , 1984 .

[2]  Van-Due Nguyen,et al.  Constructing stable force-closure grasps , 1986 .

[3]  John F. Canny,et al.  Planning optimal grasps , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[4]  Jean Ponce,et al.  On Computing Two-Finger Force-Closure Grasps of Curved 2D Objects , 1993, Int. J. Robotics Res..

[5]  Shimon Edelman,et al.  Learning to grasp using visual information , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[6]  Karun B. Shimoga,et al.  Robot Grasp Synthesis Algorithms: A Survey , 1996, Int. J. Robotics Res..

[7]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[8]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[9]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Antonio Morales,et al.  Vision-based computation of three-finger grasps on unknown planar objects , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Erkki Oja,et al.  Principal Component Analysis and Whitening , 2002 .

[12]  Justus Piater Learning Visual Features to Predict Hand Orientations , 2002 .

[13]  Henrik I. Christensen,et al.  Automatic grasp planning using shape primitives , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[14]  Ronald Lumia,et al.  Manipulation of unmodeled objects using intelligent grasping schemes , 2003, IEEE Trans. Fuzzy Syst..

[15]  Danica Kragic,et al.  Robust Visual Servoing , 2003, Int. J. Robotics Res..

[16]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[17]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[18]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  Alexei A. Efros,et al.  Opportunistic Use of Vision to Push Back the Path-Planning Horizon , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[21]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects , 2006, NIPS.

[22]  Danica Kragic,et al.  Learning and Evaluation of the Approach Vector for Automatic Grasp Generation and Planning , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[23]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[24]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[25]  Lawson L. S. Wong,et al.  Learning Grasp Strategies with Partial Shape Information , 2008, AAAI.

[26]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[27]  Danica Kragic,et al.  Selection of robot pre-grasps using box-based shape approximation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Nicholas Roy,et al.  Probabilistic Models of Object Geometry for Grasp Planning , 2008, Robotics: Science and Systems.

[29]  N. Kruger,et al.  Learning object-specific grasp affordance densities , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[30]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[31]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[32]  Ashutosh Saxena,et al.  Reactive grasping using optical proximity sensors , 2009, 2009 IEEE International Conference on Robotics and Automation.

[33]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[34]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[35]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[36]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[37]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[38]  Ashutosh Saxena,et al.  Learning to open new doors , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Quoc V. Le,et al.  Grasping novel objects with depth segmentation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[41]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Pieter Abbeel,et al.  Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding , 2010, 2010 IEEE International Conference on Robotics and Automation.

[43]  Quoc V. Le,et al.  Learning to grasp objects with multiple contact points , 2010, 2010 IEEE International Conference on Robotics and Automation.

[44]  Li Zhang Grasp Evaluation With Graspable Feature Matching , 2010 .

[45]  Advait Jain,et al.  The complex structure of simple devices: A survey of trajectories and forces that open doors and drawers , 2010, 2010 3rd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics.

[46]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[47]  Andrew Y. Ng,et al.  Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[48]  Andrew Y. Ng,et al.  Selecting Receptive Fields in Deep Networks , 2011, NIPS.

[49]  Alberto Rodriguez,et al.  From caging to grasping , 2011, Int. J. Robotics Res..

[50]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[51]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[52]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[53]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[54]  Jennifer Barry,et al.  Bakebot: Baking Cookies with the PR2 , 2011 .

[55]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[56]  Aude Billard,et al.  Coupled Dynamical System Based Hand-Arm Grasp Planning under Real-Time Perturbations , 2011, Robotics: Science and Systems.

[57]  Siddhartha S. Srinivasa,et al.  A Framework for Push-Grasping in Clutter , 2011, Robotics: Science and Systems.

[58]  Josep M. Porta,et al.  Global Optimization of Robotic Grasps , 2011, Robotics: Science and Systems.

[59]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[60]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[61]  Siddhartha S. Srinivasa,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[62]  Trevor Darrell,et al.  Visual grasp affordances from appearance-based cues , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[63]  Siddhartha S. Srinivasa,et al.  Physics-Based Grasp Planning Through Clutter , 2012, Robotics: Science and Systems.

[64]  Yun Jiang,et al.  Learning hardware agnostic grasps for a universal jamming gripper , 2012, 2012 IEEE International Conference on Robotics and Automation.

[65]  Geoffrey E. Hinton,et al.  Deep Lambertian Networks , 2012, ICML.

[66]  Darius Burschka,et al.  Rigid 3D geometry matching for grasping of known objects in cluttered scenes , 2012, Int. J. Robotics Res..

[67]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[68]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[69]  Yun Jiang,et al.  Learning to place new objects in a scene , 2012, Int. J. Robotics Res..

[70]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[71]  Anis Sahbani,et al.  An overview of 3D object grasp synthesis algorithms , 2012, Robotics Auton. Syst..

[72]  Éric Marchand,et al.  Direct 3D servoing using dense depth maps , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[73]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[74]  J. Andrew Bagnell,et al.  Robust Object Grasping using Force Compliant Motion Primitives , 2012, Robotics: Science and Systems.

[75]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[76]  César Cadena,et al.  Semantic Parsing for Priming Object Detection in RGB-D Scenes , 2013 .

[77]  Danica Kragic,et al.  Grasp Moduli Spaces , 2013, Robotics: Science and Systems.

[78]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[79]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[80]  John J. Leonard,et al.  Robust real-time visual odometry for dense RGB-D mapping , 2013, 2013 IEEE International Conference on Robotics and Automation.

[81]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[83]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[84]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[86]  Jana Kosecka,et al.  Semantic parsing for priming object detection in indoors RGB-D scenes , 2015, Int. J. Robotics Res..