Multi-scale context for scene labeling via flexible segmentation graph

Using contextual information for scene labeling has gained substantial attention in the fields of image processing and computer vision. In this paper, a fusion model using flexible segmentation graph (FSG) is presented to explore multi-scale context for scene labeling problem. Given a family of segmentations, the representation of FSG is established based on the spatial relationship of these segmentations. In the scenario of FSG, the labeling inference process is formulated as a contextual fusion model, trained from the discriminative classifiers. Compared to previous approaches, which usually employ Conditional Random Fields (CRFs) or hierarchical models to explore contextual information, our FSG representation is flexible and efficient without hierarchical constraint, allowing us to capture a wide variety of visual context for the task of scene labeling. Our approach yields state-of-the-art results on the MSRC dataset (21 classes) and the LHI dataset (15 classes), and near-record results on the SIFT Flow dataset (33 classes) and PASCAL VOC segmentation dataset (20 classes), while producing a 320?240 scene labeling in less than a second. A remarkable fact is that our approach also outperforms recent CNN-based methods. HighlightsWe propose a novel flexible segmentation graph (FSG) representation to capture multi-scale visual context for scene labeling problem.In the scenario of FSG, we establish a contextual fusion model to formulate multi-scale context.We have evaluated our method on four datasets. Our model yields similar to or better labeling result than competing models, and is also computationally efficient.

[1]  Alan L. Yuille,et al.  Non-Rigid Point Set Registration by Preserving Global and Local Structures , 2016, IEEE Transactions on Image Processing.

[2]  Cordelia Schmid,et al.  Object Recognition by Integrating Multiple Image Segmentations , 2008, ECCV.

[3]  Guosheng Lin,et al.  CRF Learning with CNN Features for Image Segmentation , 2015, Pattern Recognit..

[4]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Zhuowen Tu,et al.  Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  ZhangJ.,et al.  Local Features and Kernels for Classification of Texture and Object Categories , 2007 .

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[9]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[10]  Ruimao Zhang,et al.  Adaptive Scene Category Discovery With Generative Learning and Compositional Sampling , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Cristian Sminchisescu,et al.  Composite Statistical Inference for Semantic Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Romain Hérault,et al.  IODA: An input/output deep architecture for image labeling , 2015, Pattern Recognit..

[13]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[14]  Sanja Fidler,et al.  Human-Machine CRFs for Identifying Bottlenecks in Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Sinisa Todorovic,et al.  Scene Labeling Using Beam Search under Mutex Constraints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Laurent Najman,et al.  Geodesic Saliency of Watershed Contours and Hierarchical Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Junjun Jiang,et al.  Robust Feature Matching for Remote Sensing Image Registration via Locally Linear Transforming , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[21]  Shimon Ullman,et al.  Combined Top-Down/Bottom-Up Segmentation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Liang Lin,et al.  PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures With Edge-Preserving Coherence , 2015, IEEE Transactions on Image Processing.

[23]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jun Zhu,et al.  Learning reconfigurable scene representation by tangram model , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[25]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[26]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[27]  Long Zhu,et al.  Recursive segmentation and recognition templates for image parsing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[29]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[31]  Jing Liu,et al.  Weakly-Supervised Dual Clustering for Image Semantic Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[33]  Yi Yang,et al.  Layered object detection for multi-class segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[36]  Stephen Gould,et al.  Region-based Segmentation and Object Detection , 2009, NIPS.

[37]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[39]  Gregory Shakhnarovich,et al.  Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[43]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Andrew Zisserman,et al.  Pylon Model for Semantic Segmentation , 2011, NIPS.

[46]  David W. Jacobs,et al.  Deep hierarchical parsing for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[48]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[49]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[50]  Gabriela Csurka,et al.  Tree-Structured CRF Models for Interactive Image Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Jia Xu,et al.  Learning to segment under various forms of weak supervision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Wenyu Liu,et al.  Image labeling by multiple segmentation , 2011, 2011 18th IEEE International Conference on Image Processing.

[54]  Jitendra Malik,et al.  Context by region ancestry , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[55]  Gang Wang,et al.  Integrating parametric and non-parametric models for scene labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jun Zhu,et al.  Learning Dynamic Hybrid Markov Random Field for Image Labeling , 2013, IEEE Transactions on Image Processing.

[57]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[58]  Pushmeet Kohli,et al.  Associative Hierarchical Random Fields , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Jian-Huang Lai,et al.  Discriminatively Trained And-Or Graph Models for Object Shape Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  RotherCarsten,et al.  TextonBoost for Image Understanding , 2009 .