Learning to Segment Generic Handheld Objects Using Class-Agnostic Deep Comparison and Segmentation Network

Learning unknown objects in the environment is important for detection and manipulation tasks. Prior to learning the unknown objects the ground-truth labels have to be provided. The data annotation or labeling can be achieved in a number of ways but the most widely used method is still manual annotation. Although manual annotation has shown superior performance, it limits robots’ capabilities to known object instances and is also a time consuming task. This letter considers the aforementioned limitations and presents a method that allows robots to autonomously annotate objects from observations of human–object interactions. Specifically, we present a novel method that segments handheld objects in real-time using the class-agnostic deep comparison and segmentation network. The inputs to the network are the RGB-D data of known object template and a search space, and it outputs a pixel-wise label of the object and an objectness score. The score indicates the likelihood that the same object is present in both the inputs. The object template is manually initialized in the first frame and thereafter, the object is segmented and the template is updated online. The template is strategically updated using the likelihood score. The segmented object regions are accumulated as pseudo-ground-truth labels, which are used for object learning. The approach efficiently handles both rigid and highly deformable objects.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Markus Vincze,et al.  Learning of perceptual grouping for object segmentation on RGB-D data , 2014, J. Vis. Commun. Image Represent..

[3]  Masayuki Inaba,et al.  STAIR3D: Simultaneous tracking and incremental registration for modeling 3D handheld objects , 2017, 2017 IEEE International Conference on Advanced Intelligent Mechatronics (AIM).

[4]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[5]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Luc Van Gool,et al.  Accurate and robust registration for in-hand modeling , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[8]  Luc Van Gool,et al.  Online loop closure for real-time interactive 3D scanning , 2011, Comput. Vis. Image Underst..

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Young Hoon Lee,et al.  Object Recognition Using 3D tag-based RFID System , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Matthias Nießner,et al.  SemanticPaint , 2015, ACM Trans. Graph..

[13]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[14]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[16]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Lourdes Agapito,et al.  Co-fusion: Real-time segmentation, tracking and fusion of multiple objects , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Trevor Darrell,et al.  Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pieter Abbeel,et al.  Range sensor and silhouette fusion for high-quality 3D Scanning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Dimitrios Tzionas,et al.  3D Object Reconstruction from Hand-Object Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).