Learning to Detect Visual Grasp Affordance

Appearance-based estimation of grasp affordances is desirable when 3-D scans become unreliable due to clutter or material properties. We develop a general framework for estimating grasp affordances from 2-D sources, including local texture-like measures as well as object-category measures that capture previously learned grasp strategies. Local approaches to estimating grasp positions have been shown to be effective in real-world scenarios, but are unable to impart object-level biases and can be prone to false positives. We describe how global cues can be used to compute continuous pose estimates and corresponding grasp point locations, using a max-margin optimization for category-level continuous pose regression. We provide a novel dataset to evaluate visual grasp affordance estimation; on this dataset we show that a fused method outperforms either local or global methods alone, and that continuous pose estimation improves over discrete output models. Finally, we demonstrate our autonomous object detection and grasping system on the Willow Garage PR2 robot.

[1]  Azriel Rosenfeld,et al.  Recognition by Functional Parts , 1995, Comput. Vis. Image Underst..

[2]  Lawson L. S. Wong,et al.  Learning Grasp Strategies with Partial Shape Information , 2008, AAAI.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Danica Kragic,et al.  Grasping known objects with humanoid robots: A box-based approach , 2009, 2009 International Conference on Advanced Robotics.

[5]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[6]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[7]  Quoc V. Le,et al.  Learning to grasp objects with multiple contact points , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[9]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[10]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[11]  J. Andrew Bagnell,et al.  Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[12]  Anis Sahbani,et al.  Handling Objects by Their Handles , 2008 .

[13]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Jing Xiao,et al.  Efficient and effective grasping of novel objects through learning and adapting a knowledge base , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[18]  John F. Canny,et al.  Planning optimal grasps , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[19]  Kurt Konolige,et al.  Autonomous door opening and plugging in with a personal robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[20]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[21]  Yun Jiang,et al.  Learning to place new objects in a scene , 2012, Int. J. Robotics Res..

[22]  Danica Kragic,et al.  Learning grasping points with shape context , 2010, Robotics Auton. Syst..

[23]  Michael R. Lowry,et al.  Learning Physical Descriptions From Functional Definitions, Examples, and Precedents , 1983, AAAI.

[24]  Peter K. Allen,et al.  An SVM learning approach to robotic grasping , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[25]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  J. J. Gibson The theory of affordances , 1977 .

[27]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Dmitry B. Goldgof,et al.  Function-based recognition from incomplete knowledge of shape , 1993 .

[29]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Bernt Schiele,et al.  Functional Object Class Detection Based on Learned Affordance Cues , 2008, ICVS.

[31]  L. Stark,et al.  Dissertation Abstract , 1994, Journal of Cognitive Education and Psychology.

[32]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.