论文信息 - Combining Bottom-Up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation

Combining Bottom-Up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation

This paper addresses the problem of weakly supervised semantic image segmentation. Our goal is to label every pixel in a new image, given only image-level object labels associated with training images. Our problem statement differs from common semantic segmentation, where pixel-wise annotations are typically assumed available in training. We specify a novel deep architecture which fuses three distinct computation processes toward semantic segmentation – namely, (i) the bottom-up computation of neural activations in a CNN for the image-level prediction of object classes, (ii) the top-down estimation of conditional likelihoods of the CNNs activations given the predicted objects, resulting in probabilistic attention maps per object class, and (iii) the lateral attention-message passing from neighboring neurons at the same CNN layer. The fusion of (i)-(iii) is realized via a conditional random field as recurrent network aimed at generating a smooth and boundary-preserving segmentation. Unlike existing work, we formulate a unified end-to-end learning of all components of our deep architecture. Evaluation on the benchmark PASCAL VOC 2012 dataset demonstrates that we outperform reasonable weakly supervised baselines and state-of-the-art approaches.

Sinisa Todorovic | Anirban Roy | S. Todorovic | Anirban Roy

[1] Peiyun Hu,et al. Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Sheng Zeng,et al. Weakly supervised semantic segmentation for social images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Nitish Srivastava,et al. Learning Generative Models with Visual Attention , 2013, NIPS.

[4] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[5] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[6] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7] Samy Bengio,et al. Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[8] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[9] George Papandreou,et al. Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10] Christoph H. Lampert,et al. Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[11] Trevor Darrell,et al. Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[12] Yao Zhao,et al. Learning to segment with image-level annotations , 2016, Pattern Recognit..

[13] Philip H. S. Torr,et al. BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[14] Yi Yang,et al. Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Roberto Manduchi,et al. Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[16] Shimon Ullman,et al. Combined Top-Down/Bottom-Up Segmentation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Yann LeCun,et al. Indoor Semantic Segmentation using depth information , 2013, ICLR.

[18] Subhransu Maji,et al. Object segmentation by alignment of poselet activations to image contours , 2011, CVPR 2011.

[19] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[21] Fei-Fei Li,et al. What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[22] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Joachim M. Buhmann,et al. Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Cordelia Schmid,et al. Learning Semantic Segmentation with Weakly-Annotated Videos , 2016 .

[25] S Ullman,et al. Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[26] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Joachim M. Buhmann,et al. Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28] Trevor Darrell,et al. Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Andrew Zisserman,et al. OBJCUT: Efficient Segmentation Using Top-Down and Bottom-Up Cues , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[33] Song-Chun Zhu,et al. A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs , 2011, International Journal of Computer Vision.

[34] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[35] Lars Petersson,et al. Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation , 2016, ECCV.

[36] Wataru Shimoda,et al. Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation , 2016, ECCV.

[37] Xiaojin Gong,et al. Saliency Guided Dictionary Learning for Weakly-Supervised Image Parsing , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Xinlei Chen,et al. Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39] John K. Tsotsos,et al. Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[40] Arati Dandavate,et al. Semantic Texton Forests for Image Categorization and Segmentation , 2018, IJARCCE.

[41] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Parsing , 2013, ArXiv.

[42] 한보형. Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network , 2016 .

[43] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[44] H. Sebastian Seung,et al. The Rectified Gaussian Distribution , 1997, NIPS.