Propose and Re-rank Semantic Segmentation via Deep Image Classification

Image classification (i.e. the task of predicting the presence or absence of an object category in an image) is the poor man’s first step towards automatic image understanding. Object detection (i.e. localizing the object in the image) and semantic image segmentation (i.e. delineating the boundaries of an object) are progressively more detailed and arguably more useful image understanding tasks. Recently, the development of large-scale datasets such as ImageNet [5] has enabled a resurgence of interest in image classification [8, 12–15, 18] especially in the context of learning representations [2] via deep convolutional neural networks (CNN) [6,7,10]. At ILSVRC 2012, Krizhevsky et al. [10] presented a convolutional neural network (CNN) that significantly outperformed other entries in the image classification task, which was later vindicated by repeat performance at ILSVRC 2013. These improvements in image classification have been shown to generalize well and even help more detailed vision tasks. Donahue [6] showed that intermediate responses of these CNNs can be used as features that generalize well to classification tasks on other datasets. Girshick et al. [7] showed that classification CNNs can also be adapted to perform state-of-art object detection. In this work, we study how improvements in image classification can benefit semantic segmentation. Specifically, we study a “propose and re-rank” framework that has gained popularity in the semantic segmentation literature, and has been shown to achieve state-of-theart results on the PASCAL segmentation challenge [3, 17]. Fig. 1 shows an illustration. Diverse multiple proposals of either regions or entire segmentation maps are produced and then re-ranked so that better proposals score higher. The proposals are not perfect and this results in propagation of errors along the pipeline. The benefit of this approach however is that the re-ranking problem is simple – it involves choosing 1 from just M proposals. It is thus amenable to global optimization. It allows for incorporation of arbitrary features and high-order interactions. One can thus conveniently “plug-and-play” image classification as a “submodule” in the re-ranking step for semantic segmentation. Image classifier Classification PASCAL mean AP(%) accuracy(%) Random 7.41 44.81 Gist + linear SVM 26.83 43.89 SUN [16] kernels + SVM 60.60 44.90 DeCAF7 [6] + linear SVM + dropout 74.52 45.61 DeCAF7 + RBF SVM 76.85 46.93 Ground truth 100.00 53.22 Table 1. Image classification performance versus semantic segmentation performance when re-ranking with the corresponding image classification feature alone on PASCAL VOC12 val.

[1]  Koen E. A. van de Sande,et al.  Codemaps - Segment, Classify and Search Objects Locally , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[4]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[5]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[7]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Jian Dong,et al.  Contextualizing Object Detection and Classification , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[12]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jan C. van Gemert,et al.  Exploiting photographic style for category-level image classification by generalizing the spatial pyramid , 2011, ICMR.

[15]  Gregory Shakhnarovich,et al.  Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.