论文信息 - Hi-Fi: Hierarchical Feature Integration for Skeleton Detection

Hi-Fi: Hierarchical Feature Integration for Skeleton Detection

In natural images, skeleton scales (thickness) may significantly vary among objects and object parts. Thus, robust skeleton detection requires more powerful multi-scale feature integration ability than other vision tasks. In this paper, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. CNNs can capture high-level semantics from deeper layers as well as low-level details from shallower layers. By hierarchically integrating convolutional features with bi-direction guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses strong ability to capture both rich object context and high-resolution details. Experimental results on several benchmarks show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on skeleton detection. Our method also refreshes the state-of-the-art performance of boundary detection task.

[1] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2] Philip N. Klein,et al. Recognition of shapes by editing their shock graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Yu Liu,et al. Learning Relaxed Deep Supervision for Better Edge Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Tony Lindeberg,et al. Edge Detection and Ridge Detection with Automatic Scale Selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5] Jitendra Malik,et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6] Charless C. Fowlkes,et al. Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Yiannis Aloimonos,et al. Detection and Segmentation of 2D Curved Reflection Symmetric Structures , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10] Alan Liu,et al. MuItiscale medial analysis of medical images , 1994, Image Vis. Comput..

[11] Thomas Deselaers,et al. What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12] Chang Liu,et al. RSRN: Rich Side-Output Residual Network for Medial Axis Detection , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[13] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[14] Huchuan Lu,et al. Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[16] Shimon Ullman,et al. Class-Specific, Top-Down Segmentation , 2002, ECCV.

[17] Xiang Bai,et al. Richer Convolutional Features for Edge Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Zhuowen Tu,et al. Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Alan Liu,et al. Multiscale Medial Analysis of Medical Images , 1993, IPMI.

[20] Guoying Zhao,et al. SRN: Side-Output Residual Network for Object Symmetry Detection in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Honglak Lee,et al. Object Contour Detection with a Fully Convolutional Encoder-Decoder Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23] M. Fatih Demirci,et al. Object Recognition as Many-to-Many Feature Matching , 2006, International Journal of Computer Vision.

[24] Zihao Hu,et al. Multiple instance subspace learning via partial random projection tree for local reflection symmetry in natural images , 2016, Pattern Recognit..

[25] Vincent Lepetit,et al. Multiscale Centerline Detection by Learning a Scale-Space Distance Transform , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Xiang Bai,et al. Fusing Image and Segmentation Cues for Skeleton Extraction in the Wild , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[27] Dock Bumpers,et al. Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Longin Jan Latecki,et al. Skeleton pruning as trade-off between skeleton simplicity and reconstruction error , 2013, Science China Information Sciences.

[30] Xiang Bai,et al. DeepSkeleton: Learning Multi-Task Scale-Associated Deep Side Outputs for Object Skeleton Extraction in Natural Images , 2016, IEEE Transactions on Image Processing.

[31] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32] Xiang Bai,et al. Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Min Sun,et al. Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Ki-Sang Hong,et al. A pseudo-distance map for the segmentation-free skeletonization of gray-scale images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[35] Zeyun Yu,et al. A segmentation-free approach for skeletonization of gray-scale images via anisotropic vector diffusion , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[36] Iasonas Kokkinos,et al. Pushing the Boundaries of Boundary Detection using Deep Learning , 2015, ICLR 2016.

[37] Iasonas Kokkinos,et al. Learning-Based Symmetry Detection in Natural Images , 2012, ECCV.

[38] Andrew W. Fitzgibbon,et al. Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[39] Sven J. Dickinson,et al. AMAT: Medial Axis Transform for Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40] Yan Wang,et al. Object Skeleton Extraction in Natural Images by Fusing Scale-Associated Deep Side Outputs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).