Multi-Scale Bidirectional FCN for Object Skeleton Extraction

The performance of state-of-the-art object skeleton detection (OSD) methods have been greatly boosted by Convolutional Neural Networks (CNNs). However, the most existing CNN-based OSD methods rely on a 'skip-layer' structure where low-level and high-level features are combined to gather multi-level contextual information. Unfortunately, as shallow features tend to be noisy and lack semantic knowledge, they will cause errors and inaccuracy. Therefore, in order to improve the accuracy of object skeleton detection, we propose a novel network architecture, the Multi-Scale Bidirectional Fully Convolutional Network (MSB-FCN), to better gather and enhance multi-scale high-level contextual information. The advantage is that only deep features are used to construct multi-scale feature representations along with a bidirectional structure for better capturing contextual knowledge. This enables the proposed MSB-FCN to learn semantic-level information from different sub-regions. Moreover, we introduce dense connections into the bidirectional structure to ensure that the learning process at each scale can directly encode information from all other scales. An attention pyramid is also integrated into our MSB-FCN to dynamically control information propagation and reduce unreliable features. Extensive experiments on various benchmarks demonstrate that the proposed MSB-FCN achieves significant improvements over the state-of-the-art algorithms.

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  Iasonas Kokkinos,et al.  Learning-Based Symmetry Detection in Natural Images , 2012, ECCV.

[3]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[4]  Dinggang Shen,et al.  Contour Knowledge Transfer for Salient Object Detection , 2018, ECCV.

[5]  Xin Zhao,et al.  Deep Crisp Boundaries , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yan Wang,et al.  Object Skeleton Extraction in Natural Images by Fusing Scale-Associated Deep Side Outputs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Sven J. Dickinson,et al.  Detecting Curved Symmetric Parts Using a Deformable Disc Model , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Siwei Lyu,et al.  Cascade Graph Neural Networks for RGB-D Salient Object Detection , 2020, ECCV.

[9]  Chang Liu,et al.  Linear Span Network for Object Skeleton Detection , 2018, ECCV.

[10]  Vincent Lepetit,et al.  Multiscale Centerline Detection by Learning a Scale-Space Distance Transform , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Fan Yang,et al.  EKENet: Efficient knowledge enhanced network for real-time scene parsing , 2021, Pattern Recognit..

[13]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiang Bai,et al.  Richer Convolutional Features for Edge Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Xin Li,et al.  Hybrid Graph Neural Networks for Crowd Counting , 2020, AAAI.

[16]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Zihao Hu,et al.  Multiple instance subspace learning via partial random projection tree for local reflection symmetry in natural images , 2016, Pattern Recognit..

[18]  Yuan Gao,et al.  Exploiting Symmetry and/or Manhattan Properties for 3D Object Structure Estimation from Single and Multiple Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Sven J. Dickinson,et al.  DeepFlux for Skeletons in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiang Bai,et al.  DeepSkeleton: Learning Multi-Task Scale-Associated Deep Side Outputs for Object Skeleton Extraction in Natural Images , 2016, IEEE Transactions on Image Processing.

[22]  Wei Liu,et al.  NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[24]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[25]  Nick Barnes,et al.  Uncertainty Inspired RGB-D Saliency Detection , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Ling Shao,et al.  Submodular Trajectories for Better Motion Segmentation in Videos , 2018, IEEE Transactions on Image Processing.

[27]  Sven J. Dickinson,et al.  Multiscale Symmetric Part Detection and Grouping , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Haibin Ling,et al.  A Deep Network Solution for Attention and Aesthetics Aware Photo Cropping , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Zhuowen Tu,et al.  Geometry-Aware End-to-End Skeleton Detection , 2019, BMVC.

[30]  Wei Shen,et al.  Hi-Fi: Hierarchical Feature Integration for Skeleton Detection , 2018, IJCAI.

[31]  Yiannis Aloimonos,et al.  Detection and Segmentation of 2D Curved Reflection Symmetric Structures , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Tengpeng Li,et al.  Re-thinking Co-Salient Object Detection , 2021, IEEE transactions on pattern analysis and machine intelligence.

[33]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[34]  Ruigang Yang,et al.  Inferring Salient Objects from Human Fixations , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Xiang Bai,et al.  Fusing Image and Segmentation Cues for Skeleton Extraction in the Wild , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[37]  Sven J. Dickinson,et al.  Learning to Combine Mid-Level Cues for Object Proposal Generation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Punam K. Saha,et al.  A survey on skeletonization algorithms and their applications , 2016, Pattern Recognit. Lett..

[39]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Dongdai Lin,et al.  Symmetry Constraint for Foreground Extraction , 2014, IEEE Transactions on Cybernetics.

[41]  Jianbing Shen,et al.  Local Semantic Siamese Networks for Fast Tracking , 2019, IEEE Transactions on Image Processing.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Tony Lindeberg,et al.  Edge Detection and Ridge Detection with Automatic Scale Selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[45]  Sanja Fidler,et al.  Gated-SCNN: Gated Shape CNNs for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Zihan Zhou,et al.  Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[48]  Hanqiu Sun,et al.  Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks , 2020, IEEE Transactions on Image Processing.

[49]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Ching Y. Suen,et al.  Thinning Methodologies - A Comprehensive Survey , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Guoying Zhao,et al.  SRN: Side-Output Residual Network for Object Symmetry Detection in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[55]  Fan Yang,et al.  Multi-Scale Cascade Network for Salient Object Detection , 2017, ACM Multimedia.

[56]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.

[57]  Ling Shao,et al.  Visual Object Tracking by Hierarchical Attention Siamese Network , 2020, IEEE Transactions on Cybernetics.

[58]  John R. Hershey,et al.  Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Xiaogang Jin,et al.  Quadruplet Network With One-Shot Learning for Fast Visual Object Tracking , 2017, IEEE Transactions on Image Processing.

[60]  Dinggang Shen,et al.  Decoding EEG by Visual-guided Deep Neural Networks , 2019, IJCAI.

[61]  Max Mignotte,et al.  Local Symmetry Detection in Natural Images Using a Particle Filtering Approach , 2014, IEEE Transactions on Image Processing.