论文信息 - A CNN Model for Semantic Person Part Segmentation With Capacity Optimization

A CNN Model for Semantic Person Part Segmentation With Capacity Optimization

In this paper, a deep learning model with an optimal capacity is proposed to improve the performance of person part segmentation. Previous efforts in optimizing the capacity of a convolutional neural network (CNN) model suffer from a lack of large datasets as well as the over-dependence on a single-modality CNN, which is not effective in learning. We make several efforts in addressing these problems. First, other datasets are utilized to train a CNN module for pre-processing image data and a segmentation performance improvement is achieved without a time-consuming annotation process. Second, we propose a novel way of integrating two complementary modules to enrich the feature representations for more reliable inferences. Third, the factors to determine the capacity of a CNN model are studied and two novel methods are proposed to adjust (optimize) the capacity of a CNN to match it to the complexity of a task. The over-fitting and under-fitting problems are eased by using our methods. Experimental results show that our model outperforms the state-of-the-art deep learning models with a better generalization ability and a lower computational complexity.

Zheru Chi | Yalong Jiang

[1] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[2] Jun Zhu,et al. Pose-Guided Human Parsing with Deep Learned Features , 2015, ArXiv.

[3] Wolfram Burgard,et al. Deep learning for human part discovery in images , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[4] Shuicheng Yan,et al. Semantic Object Parsing with Graph LSTM , 2016, ECCV.

[5] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Liang Lin,et al. Look into Person: Joint Body Parsing & Pose Estimation Network and a New Benchmark , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Konstantinos Kamnitsas,et al. DeepCut: Object Segmentation From Bounding Box Annotations Using Convolutional Neural Networks , 2016, IEEE Transactions on Medical Imaging.

[8] H. Abdi,et al. Principal component analysis , 2010 .

[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[10] Jian Sun,et al. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Thomas Brox,et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[12] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[13] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[14] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[18] Geoffrey E. Hinton,et al. Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[19] Jonathan T. Barron,et al. Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Martial Hebert,et al. Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[22] Peng Wang,et al. Joint Multi-person Pose Estimation and Semantic Part Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[24] Olivier Sigaud,et al. Towards Deep Developmental Learning , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[25] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[26] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[27] Jianxin Wu,et al. Deep Label Distribution Learning With Label Ambiguity , 2016, IEEE Transactions on Image Processing.

[28] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Yi Yang,et al. Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Zheru Chi,et al. A Fully-Convolutional Framework for Semantic Segmentation , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[31] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33] Yi Yang,et al. A Probabilistic Associative Model for Segmenting Weakly Supervised Images , 2014, IEEE Transactions on Image Processing.

[34] Jian Sun,et al. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[36] Yueting Zhuang,et al. DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[37] Sanja Fidler,et al. Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[39] Yan Wang,et al. SORT: Second-Order Response Transform for Visual Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40] Xiaoxiao Li,et al. Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Alan L. Yuille,et al. Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net , 2015, ECCV.

[42] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Shuicheng Yan,et al. Semantic Object Parsing with Local-Global Long Short-Term Memory , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] T. Moon. The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[47] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Jia Xu,et al. Learning to segment under various forms of weak supervision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[51] Zhi-Hua Zhou,et al. A brief introduction to weakly supervised learning , 2018 .

[52] Jianping Fan,et al. Automatic image segmentation by integrating color-edge extraction and seeded region growing , 2001, IEEE Trans. Image Process..