Multi-hypothesis contextual modeling for semantic segmentation

Abstract Semantic segmentation (i.e. image parsing) aims to annotate each image pixel with its corresponding semantic class label. Spatially consistent labeling of the image requires an accurate description and modeling of the local contextual information. Segmentation result is typically improved by Markov Random Field (MRF) optimization on the initial labels. However this improvement is limited by the accuracy of initial result and how the contextual neighborhood is defined. In this paper, we develop generalized and flexible contextual models for segmentation neighborhoods in order to improve parsing accuracy. Instead of using a fixed segmentation and neighborhood definition, we explore various contextual models for fusion of complementary information available in alternative segmentations of the same image. In other words, we propose a novel MRF framework that describes and optimizes the contextual dependencies between multiple segmentations. Simulation results on two common datasets demonstrate significant improvement in parsing accuracy over the baseline approaches.

[1]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Qianni Zhang,et al.  LSI: Latent semantic inference for natural image segmentation , 2016, Pattern Recognit..

[3]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Pushmeet Kohli,et al.  Associative Hierarchical Random Fields , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Larry S. Davis,et al.  Learning What and How of Contextual Models for Scene Labeling , 2010, ECCV.

[6]  Shuicheng Yan,et al.  Adaptive Nonparametric Image Parsing , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Xuming He,et al.  Stacked Learning to Search for Scene Labeling , 2017, IEEE Transactions on Image Processing.

[9]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[11]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[13]  Marian George,et al.  Image parsing with a wide range of classes and scene-level context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Jenny Benois-Pineau,et al.  Segmentation-based multi-class semantic object detection , 2012, Multimedia Tools and Applications.

[16]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[17]  Tam V. Nguyen,et al.  Exploiting generic multi-level convolutional neural networks for scene understanding , 2016, 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[18]  Hasan F. Ates,et al.  Improving Semantic Segmentation with Generalized Models of Local Context , 2017, CAIP.

[19]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[20]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[21]  Gang Wang,et al.  DAG-Recurrent Neural Networks for Scene Labeling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[25]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Zhen Li,et al.  LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling , 2016, ECCV.

[27]  Heesoo Myeong,et al.  Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Marcus Liwicki,et al.  Scene labeling with LSTM recurrent neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xiaolin Hu,et al.  Convolutional Neural Networks with Intra-Layer Recurrent Connections for Scene Labeling , 2015, NIPS.

[32]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Svetlana Lazebnik,et al.  Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors , 2015, International Journal of Computer Vision.

[34]  Kenan E. Ak,et al.  Scene segmentation and labeling using multi-hypothesis superpixels , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).

[35]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Luis Enrique Sucar,et al.  Image Annotation by a Hierarchical and Iterative Combination of Recognition and Segmentation , 2018, Int. J. Pattern Recognit. Artif. Intell..

[37]  Cordelia Schmid,et al.  Object Recognition by Integrating Multiple Image Segmentations , 2008, ECCV.