Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees

We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmentation. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as minimizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pair wise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy relative to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.

[1]  Chenliang Xu,et al.  Evaluation of super-voxel methods for early video processing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Pushmeet Kohli,et al.  Making the right moves: Guiding alpha-expansion using local primal-dual gaps , 2011, CVPR 2011.

[4]  Endre Boros,et al.  Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[5]  Jason J. Corso,et al.  Propagating multi-class pixel labels throughout video frames , 2010, 2010 Western New York Image Processing Workshop.

[6]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Zhuowen Tu,et al.  Graph-shifts: Natural image labeling by dynamic hierarchical computing , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Christopher Raphael,et al.  Coarse-to-Fine Dynamic Programming , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  Pylon Model for Semantic Segmentation , 2011, NIPS.

[11]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[12]  René Vidal,et al.  Multiframe Motion Segmentation with Missing Data Using PowerFactorization and GPCA , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Jitendra Malik,et al.  Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[14]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  A. Pentland,et al.  Robust estimation of a multi-layered motion representation , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[16]  Daniel Cremers,et al.  Motion Competition: A variational framework for piecewise parametric motion segmentation , 2005 .

[17]  Bastian Leibe,et al.  Joint 2D-3D temporally consistent semantic segmentation of street scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yannis Avrithis,et al.  Spatiotemporal semantic video segmentation , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[19]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[21]  René Vidal,et al.  Using global bag of features models in random fields for joint categorization and segmentation of objects , 2011, CVPR 2011.

[22]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[25]  René Vidal,et al.  Visual Dictionary Learning for Joint Object Categorization and Segmentation , 2012, ECCV.

[26]  Daniel Cremers,et al.  Motion Competition: A Variational Approach to Piecewise Parametric Motion Segmentation , 2005, International Journal of Computer Vision.

[27]  Daphne Koller,et al.  MAP Estimation of Semi-Metric MRFs via Hierarchical Graph Cuts , 2009, UAI.

[28]  Stuart J. Russell,et al.  A temporally abstracted Viterbi algorithm , 2011, UAI.