Large-scale semantic co-labeling of image sets

As evidenced by video segmentation and cosegmentation approaches, exploiting multiple images is key to the success of visual scene understanding. With the availability of increasingly large sets of images, there is a clear need for methods that can efficiently analyze the similarities and structure across huge numbers of image pixels. Furthermore, to make effective use of this data, these similarities should not just be considered locally between neighboring pixels, but between all pairs of pixels across all images. In this paper, we tackle this challenging scenario by introducing a semantic co-labeling approach that performs efficient inference in a fully-connected CRF defined over the pixels, or superpixels, of an image set. Our experimental evaluation demonstrates that our approach yields improved accuracy while coming at no additional computation cost compared to performing segmentation sequentially on individual images. Furthermore, our formulation lets us perform inference over ten thousand images in a matter of seconds.

[1]  Jiayan Jiang,et al.  Efficient scale space auto-context for image segmentation and labeling , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[3]  Vibhav Vineet,et al.  Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2012, International Journal of Computer Vision.

[4]  Philip H. S. Torr,et al.  What , Where & How Many ? Combining Object Detectors and CRFs , 2010 .

[5]  Ruigang Yang,et al.  Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[6]  Dimitris N. Metaxas,et al.  ]Video object segmentation by hypergraph cut , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[9]  Jiayan Jiang,et al.  Efficient scale space auto-context for image segmentation and labeling , 2009, CVPR.

[10]  Eric P. Xing,et al.  On multiple foreground cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[12]  Stephen Gould DARWIN: a framework for machine learning and computer vision research and development , 2012, J. Mach. Learn. Res..

[13]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[14]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[15]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Takeo Kanade,et al.  Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[17]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Vladimir Kolmogorov,et al.  Cosegmenting Image Pairs by Matching Global Histograms , 2006 .

[19]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[20]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[21]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[23]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[26]  Vibhav Vineet,et al.  Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2012, ECCV.

[27]  Jia Xu,et al.  Analyzing the Subspace Structure of Related Images: Concurrent Segmentation of Image Sets , 2012, ECCV.

[28]  John W. Woods,et al.  Spatio-temporal adaptive 3-D Kalman filter for video , 1997, IEEE Trans. Image Process..

[29]  Andrew Adams,et al.  Fast High‐Dimensional Filtering Using the Permutohedral Lattice , 2010, Comput. Graph. Forum.

[30]  Sylvain Paris,et al.  Edge-Preserving Smoothing and Mean-Shift Segmentation of Video Streams , 2008, ECCV.

[31]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[32]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[33]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[34]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Eric L. Miller,et al.  Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[36]  Tsuhan Chen,et al.  Efficient inference for fully-connected CRFs with stationarity , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[38]  Qingshan Liu,et al.  Video object segmentation by hypergraph cut , 2009, CVPR.

[39]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[42]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  James M. Rehg,et al.  Weakly Supervised Learning of Object Segmentations from Web-Scale Video , 2012, ECCV Workshops.

[44]  Ivan Laptev,et al.  Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[45]  Bastian Leibe,et al.  Multi-Class Image Labeling with Top-Down Segmentation and Generalized Robust $P^N$ Potentials , 2011, BMVC.