Hi-Fi: Hierarchical Feature Integration for Skeleton Detection

In natural images, skeleton scales (thickness) may significantly vary among objects and object parts. Thus, robust skeleton detection requires more powerful multi-scale feature integration ability than other vision tasks. In this paper, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. CNNs can capture high-level semantics from deeper layers as well as low-level details from shallower layers. By hierarchically integrating convolutional features with bi-direction guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses strong ability to capture both rich object context and high-resolution details. Experimental results on several benchmarks show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on skeleton detection. Our method also refreshes the state-of-the-art performance of boundary detection task.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Philip N. Klein,et al.  Recognition of shapes by editing their shock graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Yu Liu,et al.  Learning Relaxed Deep Supervision for Better Edge Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Tony Lindeberg,et al.  Edge Detection and Ridge Detection with Automatic Scale Selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yiannis Aloimonos,et al.  Detection and Segmentation of 2D Curved Reflection Symmetric Structures , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Alan Liu,et al.  MuItiscale medial analysis of medical images , 1994, Image Vis. Comput..

[11]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Chang Liu,et al.  RSRN: Rich Side-Output Residual Network for Medial Axis Detection , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[13]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[14]  Huchuan Lu,et al.  Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[16]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.

[17]  Xiang Bai,et al.  Richer Convolutional Features for Edge Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Alan Liu,et al.  Multiscale Medial Analysis of Medical Images , 1993, IPMI.

[20]  Guoying Zhao,et al.  SRN: Side-Output Residual Network for Object Symmetry Detection in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Honglak Lee,et al.  Object Contour Detection with a Fully Convolutional Encoder-Decoder Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  M. Fatih Demirci,et al.  Object Recognition as Many-to-Many Feature Matching , 2006, International Journal of Computer Vision.

[24]  Zihao Hu,et al.  Multiple instance subspace learning via partial random projection tree for local reflection symmetry in natural images , 2016, Pattern Recognit..

[25]  Vincent Lepetit,et al.  Multiscale Centerline Detection by Learning a Scale-Space Distance Transform , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Xiang Bai,et al.  Fusing Image and Segmentation Cues for Skeleton Extraction in the Wild , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[27]  Dock Bumpers,et al.  Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Longin Jan Latecki,et al.  Skeleton pruning as trade-off between skeleton simplicity and reconstruction error , 2013, Science China Information Sciences.

[30]  Xiang Bai,et al.  DeepSkeleton: Learning Multi-Task Scale-Associated Deep Side Outputs for Object Skeleton Extraction in Natural Images , 2016, IEEE Transactions on Image Processing.

[31]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Ki-Sang Hong,et al.  A pseudo-distance map for the segmentation-free skeletonization of gray-scale images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[35]  Zeyun Yu,et al.  A segmentation-free approach for skeletonization of gray-scale images via anisotropic vector diffusion , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[36]  Iasonas Kokkinos,et al.  Pushing the Boundaries of Boundary Detection using Deep Learning , 2015, ICLR 2016.

[37]  Iasonas Kokkinos,et al.  Learning-Based Symmetry Detection in Natural Images , 2012, ECCV.

[38]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[39]  Sven J. Dickinson,et al.  AMAT: Medial Axis Transform for Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Yan Wang,et al.  Object Skeleton Extraction in Natural Images by Fusing Scale-Associated Deep Side Outputs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).