Latent Semantics Local Distribution for CRF-based Image Semantic Segmentation

Semantic image segmentation is the task of assigning a semantic label to every pixel of an image. This task is posed as a supervised learning problem in which the appearance of areas that correspond to a number of semantic categories are learned from a dataset of manually labelled images. This paper proposes a method that combines a region-based probabilistic graphical model that builds on the recent success of Conditional Random Fields (CRFs) in the problem of semantic segmentation, with a salient-points-based bagsof-words paradigm. In a first stage, the image is oversegmented into patches. Then, in a CRF-based formulation we learn both the appearance for each semantic category and the neighbouring relations between patches. In addition to patch features, we also consider information extracted on salient points that are detected in the patch’s vicinity. A visual word is associated to each salient point. Two different types of information are used. First, we consider the local weighted distribution of visual words. Using local (i.e. centred at each patch) word histograms enriches the classical global bags-of-word representation with positional information on word distributions. Second, we consider the un-normalised local distribution of a set of latent topics that are obtained by probabilistic Latent Semantic Analysis (pLSA). This distribution is obtained by the weighted accumulation of the latent topic distributions that are associated to the visual words in the area. The advantage of this second approach lays in the separate representation of the semantic content for each visual word. This allows us to consider the word contributions as independent in the CRF formulation without introducing too strong simplification assumptions. Tests on a publicly available dataset demonstrate the validity of the proposed salient point integration strategies. The results obtained with different configurations show an advance compared to other leading works in the area.

[1]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[2]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Ioannis Patras,et al.  Video Segmentation by MAP Labeling of Watershed Segments , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[6]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[8]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[15]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[16]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Bill Triggs,et al.  Scene Segmentation with CRFs Learned from Partially Labeled Images , 2007, NIPS.

[18]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Antonio Criminisi,et al.  Object Class Segmentation using Random Forests , 2008, BMVC.

[21]  Osamu Hasegawa,et al.  Random Field Model for Integration of Local Information and Global Information , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ioannis Patras,et al.  Aspect coherence for graph-based image labelling , 2008 .

[23]  Frédéric Jurie,et al.  Category Level Object Segmentation by Combining Bag-of-Words Models with Dirichlet Processes and Random Fields , 2010, International Journal of Computer Vision.

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .