Learning RGB-D descriptors of garment parts for informed robot grasping

Robotic handling of textile objects in household environments is an emerging application that has recently received considerable attention thanks to the development of domestic robots. Most current approaches follow a multiple re-grasp strategy for this purpose, in which clothes are sequentially grasped from different points until one of them yields a desired configuration. In this work we propose a vision-based method, built on the Bag of Visual Words approach, that combines appearance and 3D information to detect parts suitable for grasping in clothes, even when they are highly wrinkled. We also contribute a new, annotated, garment part dataset that can be used for benchmarking classification, part detection, and segmentation algorithms. The dataset is used to evaluate our approach and several state-of-the-art 3D descriptors for the task of garment part detection. Results indicate that appearance is a reliable source of information, but that augmenting it with 3D information can help the method perform better with new clothing items.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Hossein Nezamabadi-pour,et al.  A stochastic gravitational approach to feature based color image segmentation , 2013, Eng. Appl. Artif. Intell..

[3]  James F. O'Brien,et al.  Bringing clothing into desired configurations with limited perception , 2011, 2011 IEEE International Conference on Robotics and Automation.

[4]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[5]  Ramón López de Mántaras,et al.  Fast and robust object segmentation with the Integral Linear Classifier , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Yichen Wei,et al.  Efficient histogram-based sliding window , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[10]  Sun Li,et al.  Glasgow's Stereo Image Database of Garments , 2013, ArXiv.

[11]  Edwin Olson,et al.  Extracting general-purpose features from LIDAR data , 2010, 2010 IEEE International Conference on Robotics and Automation.

[12]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[13]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[14]  Francesc Moreno-Noguer,et al.  Simultaneous pose, correspondence and non-rigid shape , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Pieter Abbeel,et al.  Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding , 2010, 2010 IEEE International Conference on Robotics and Automation.

[16]  Iasonas Kokkinos,et al.  Scale-invariant heat kernel signatures for non-rigid shape recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Trevor Darrell,et al.  Perception for the manipulation of socks , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Remco C. Veltkamp,et al.  A survey of content based 3D shape retrieval methods , 2004, Proceedings Shape Modeling Applications, 2004..

[21]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[22]  I. Walker,et al.  Classification of Clothing Using Midlevel Layers , 2013 .

[23]  Francesc Moreno-Noguer,et al.  Deformation and illumination invariant feature point descriptor , 2011, CVPR 2011.

[24]  Haibin Ling,et al.  Deformation invariant image matching , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  M. Himmelsbach,et al.  Real-time object classification in 3D point clouds using point feature histograms , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.

[27]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[28]  Jamie Shotton,et al.  Semantic Texton Forests , 2010, Computer Vision: Detection, Recognition and Reconstruction.

[29]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[30]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  J A Sethian,et al.  A fast marching level set method for monotonically advancing fronts. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Stefano Soatto,et al.  Localizing Objects with Smart Dictionaries , 2008, ECCV.

[33]  Miguel Lázaro-Gredilla,et al.  Low-cost model selection for SVMs using local features , 2012, Eng. Appl. Artif. Intell..

[34]  Wen-Huang Cheng,et al.  Clothing genre classification by exploiting the style elements , 2012, ACM Multimedia.

[35]  Ian D. Walker,et al.  Classification of clothing using interactive perception , 2011, 2011 IEEE International Conference on Robotics and Automation.

[36]  Francesc Moreno-Noguer,et al.  Using depth and appearance features for informed robot grasping of highly wrinkled clothes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[37]  ZhangJ.,et al.  Local Features and Kernels for Classification of Texture and Object Categories , 2007 .

[38]  Nico Blodow,et al.  Persistent Point Feature Histograms for 3D Point Clouds , 2008 .

[39]  Francesc Moreno-Noguer,et al.  FINDDD: A fast 3D descriptor to characterize textiles for robot manipulation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Wang Xiao-hui,et al.  Color texture segmentation based on quaternion-gabor features , 2006 .

[41]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[42]  Ioannis Mariolis,et al.  Matching Folded Garments to Unfolded Templates Using Robust Shape Analysis Techniques , 2013, CAIP.

[43]  Vincent Lepetit,et al.  Capturing 3D stretchable surfaces from single images in closed form , 2009, CVPR.

[44]  Francesc Moreno-Noguer,et al.  Stochastic Exploration of Ambiguities for Nonrigid Shape Recovery , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  C. V. Jawahar,et al.  The truth about cats and dogs , 2011, 2011 International Conference on Computer Vision.

[46]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[47]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[48]  Tae-Kyun Kim,et al.  Autonomous active recognition and unfolding of clothes using random decision forests and probabilistic planning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Xiangyang Wang,et al.  Color texture segmentation based on image pixel classification , 2012, Eng. Appl. Artif. Intell..

[50]  Kimitoshi Yamazaki,et al.  Clothing classification using image features derived from clothing fabrics, wrinkles and cloth overlaps , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[52]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).