Benchmarking Convolutional Neural Networks for Object Segmentation and Pose Estimation

Convolutional neural networks (CNNs), particularly those designed for object segmentation and pose estimation, are now applied to robotics applications involving mobile manipulation. For these robotic applications to be successful, robust and accurate performance from the CNNs is critical. Therefore, in order to develop an understanding of CNN performance, several CNN architectures are benchmarked on a set of metrics for object segmentation and pose estimation. This paper presents these benchmarking results, which show that metric performance is dependent on the complexity of network architectures. These findings can be used to guide and improve the development of CNNs for object segmentation and pose estimation in the future.

[1]  Du Q. Huynh,et al.  Metrics for 3D Rotations: Comparison and Analysis , 2009, Journal of Mathematical Imaging and Vision.

[2]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[3]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[4]  Bolei Zhou,et al.  SegICP: Integrated deep semantic segmentation and pose estimation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Yang Wang,et al.  Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation , 2016, ISVC.

[6]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Gian Luca Mariottini,et al.  SegICP-DSR: Dense Semantic Scene Reconstruction and Registration , 2017, ArXiv.

[8]  Camille Couprie,et al.  Semantic Segmentation using Adversarial Networks , 2016, NIPS 2016.

[9]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[10]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).