论文信息 - SegICP: Integrated deep semantic segmentation and pose estimation

SegICP: Integrated deep semantic segmentation and pose estimation

Recent robotic manipulation competitions have highlighted that sophisticated robots still struggle to achieve fast and reliable perception of task-relevant objects in complex, realistic scenarios. To improve these systems' perceptive speed and robustness, we present SegICP, a novel integrated solution to object recognition and pose estimation. SegICP couples convolutional neural networks and multi-hypothesis point cloud registration to achieve both robust pixel-wise semantic segmentation as well as accurate and real-time 6-DOF pose estimation for relevant objects. Our architecture achieves 1 cm position error and < 5° angle error in real time without an initial seed. We evaluate and benchmark SegICP against an annotated dataset generated by motion capture.

[1] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[2] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[3] Roberto Cipolla,et al. Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[4] David G. Lowe,et al. Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5] Rama Chellappa,et al. Fast object localization and pose estimation in heavy clutter for robotic bin picking , 2012, Int. J. Robotics Res..

[6] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[8] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9] Paul J. Besl,et al. A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[10] Radu Bogdan Rusu,et al. 3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[11] Justin Manzo,et al. The DARPA Robotics Challenge [Competitions] , 2013, IEEE Robotics Autom. Mag..

[12] Xiaoqing Yu,et al. 3D point cloud matching based on principal component analysis and iterative closest point algorithm , 2016, 2016 International Conference on Audio, Language and Image Processing (ICALIP).

[13] Peter V. Gehler,et al. Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Daniel P. Huttenlocher,et al. Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Philip H. S. Torr,et al. What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[16] Marc Levoy,et al. Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[17] Dieter Fox,et al. DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[18] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[20] Vladlen Koltun,et al. Fast Global Registration , 2016, ECCV.

[21] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Vincent Lepetit,et al. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[23] Silvio Savarese,et al. 3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[25] Joseph M. Romano,et al. The Amazon Picking Challenge 2015 [Competitions] , 2015, IEEE Robotics Autom. Mag..

[26] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29] Antonio Torralba,et al. FPM: Fine Pose Parts-Based Model with 3D CAD Models , 2014, ECCV.

[30] Kuan-Ting Yu,et al. Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Paul J. Besl,et al. Method for registration of 3-D shapes , 1992, Other Conferences.

[33] Tinne Tuytelaars,et al. Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[34] Roberto Cipolla,et al. Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.