Joint Calibration for Semantic Segmentation

Semantic segmentation is the task of assigning a class-label to each pixel in an image. We propose a region-based semantic segmentation framework which handles both full and weak supervision, and addresses three common problems: (1) Objects occur at multiple scales and therefore we should use regions at multiple scales. However, these regions are overlapping which creates conflicting class predictions at the pixel-level. (2) Class frequencies are highly imbalanced in realistic datasets. (3) Each pixel can only be assigned to a single class, which creates competition between classes. We address all three problems with a joint calibration method which optimizes a multi-class loss defined over the final pixel-level output labeling, as opposed to simply region classification. Our method outperforms the state-of-the-art on the popular SIFT Flow [18] dataset in both the fully and weakly supervised setting by a considerably margin (+6% and +10%, respectively).

[1]  Yue Gao,et al.  Representative Discovery of Structure Cues for Weakly-Supervised Image Segmentation , 2014, IEEE Transactions on Multimedia.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Joachim M. Buhmann,et al.  Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[5]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[6]  Marcus Liwicki,et al.  Scene labeling with LSTM recurrent neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  LeCunYann,et al.  Learning Hierarchical Features for Scene Labeling , 2013 .

[8]  Marc Toussaint,et al.  Multi-class image segmentation using conditional random fields and global classification , 2009, ICML '09.

[9]  Gang Wang,et al.  Integrating parametric and non-parametric models for scene labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alain Trémeau,et al.  Contextually Constrained Deep Networks for Scene Labeling. , 2014, BMVC 2014.

[11]  Joost van de Weijer,et al.  Fusing Global and Local Scale for Semantic Image Segmentation , 2011 .

[12]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Gregory Shakhnarovich,et al.  Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[15]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[16]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[17]  Jia Xu,et al.  Learning to segment under various forms of weak supervision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Sheng Zeng,et al.  Weakly supervised semantic segmentation for social images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[20]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[24]  Jia Xu,et al.  Tell Me What You See and I Will Show You Where It Is , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Svetlana Lazebnik,et al.  Scene Parsing with Object Instances and Occlusion Ordering , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Ming-Yu Liu,et al.  Recursive Context Propagation Network for Semantic Scene Labeling , 2014, NIPS.

[30]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[31]  Joost van de Weijer,et al.  Harmony Potentials , 2011, International Journal of Computer Vision.

[32]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Svetlana Lazebnik,et al.  Understanding scenes on many levels , 2011, 2011 International Conference on Computer Vision.

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Joachim M. Buhmann,et al.  Weakly supervised semantic segmentation with a multi-image model , 2011, 2011 International Conference on Computer Vision.

[36]  Cristian Sminchisescu,et al.  Composite Statistical Inference for Semantic Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[39]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  David W. Jacobs,et al.  Deep hierarchical parsing for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Marian George,et al.  Image parsing with a wide range of classes and scene-level context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[43]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[44]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Parsing , 2013, ArXiv.

[47]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[48]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[49]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.