Weakly Supervised Deep Semantic Segmentation Using CNN and ELM with Semantic Candidate Regions

The task of semantic segmentation is to obtain strong pixel-level annotations for each pixel in the image. For fully supervised semantic segmentation, the task is achieved by a segmentation model trained using pixel-level annotations. However, the pixel-level annotation process is very expensive and time-consuming. To reduce the cost, the paper proposes a semantic candidate regions trained extreme learning machine (ELM) method with image-level labels to achieve pixel-level labels mapping. In this work, the paper casts the pixel mapping problem into a candidate region semantic inference problem. Specifically, after segmenting each image into a set of superpixels, superpixels are automatically combined to achieve segmentation of candidate region according to the number of image-level labels. Semantic inference of candidate regions is realized based on the relationship and neighborhood rough set associated with semantic labels. Finally, the paper trains the ELM using the candidate regions of the inferred labels to classify the test candidate regions. The experiment is verified on the MSRC dataset and PASCAL VOC 2012, which are popularly used in semantic segmentation. The experimental results show that the proposed method outperforms several state-of-the-art approaches for deep semantic segmentation.

[1]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xuelong Li,et al.  Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement , 2018, Pattern Recognit..

[3]  Xinying Xu,et al.  SR-POD: Sample rotation based on principal-axis orientation distribution for data augmentation in deep object detection , 2018, Cognitive Systems Research.

[4]  Xiaodong Li,et al.  A Dynamic Neighborhood Learning-Based Gravitational Search Algorithm , 2018, IEEE Transactions on Cybernetics.

[5]  Jing Liu,et al.  Weakly-Supervised Dual Clustering for Image Semantic Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Chen Chen,et al.  Gabor Convolutional Networks , 2018, WACV.

[7]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Stephen Marshall,et al.  Cognitive Fusion of Thermal and Visible Imagery for Effective Detection and Tracking of Pedestrians in Videos , 2018, Cognitive Computation.

[10]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Yao Zhao,et al.  Learning to segment with image-level annotations , 2016, Pattern Recognit..

[12]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[13]  Zheng Wang,et al.  A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos , 2018, Neurocomputing.

[14]  Yue Gao,et al.  Zero-Shot Learning With Transferred Samples , 2017, IEEE Transactions on Image Processing.

[15]  Stephen Marshall,et al.  MIMR-DGSA: Unsupervised hyperspectral band selection based on information theory and a modified discrete gravitational search algorithm , 2019, Inf. Fusion.

[16]  Qiang Ni,et al.  Joint Image-Text Hashing for Fast Large-Scale Cross-Media Retrieval Using Self-Supervised Deep Learning , 2019, IEEE Transactions on Industrial Electronics.

[17]  Tao Xiang,et al.  Weakly-Supervised Image Annotation and Segmentation with Objects and Attributes , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Joachim M. Buhmann,et al.  Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jungong Han,et al.  Large-scale image retrieval with Sparse Embedded Hashing , 2017, Neurocomputing.

[21]  Qing-Hua Hu,et al.  Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation: Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation , 2008 .

[22]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ling Shao,et al.  Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval , 2019, IEEE Transactions on Image Processing.

[24]  Joachim M. Buhmann,et al.  Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Joachim M. Buhmann,et al.  Weakly supervised semantic segmentation with a multi-image model , 2011, 2011 International Conference on Computer Vision.

[26]  Yunchao Wei,et al.  Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[28]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Ke Zhang,et al.  Sparse Reconstruction for Weakly Supervised Semantic Segmentation , 2013, IJCAI.

[30]  Wu Ke,et al.  Image Semantic Analysis and Understanding:A Review , 2010 .

[31]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Zhiwu Lu,et al.  Learning from Weak and Noisy Labels for Semantic Segmentation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Huang Qinghua,et al.  Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation , 2008 .

[34]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[36]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[37]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.