Causal video object segmentation from persistence of occlusions

Occlusion relations inform the partition of the image domain into “objects” but are difficult to determine from a single image or short-baseline video. We show how long-term occlusion relations can be robustly inferred from video, and used within a convex optimization framework to segment the image domain into regions. We highlight the challenges in determining these occluder/occluded relations and ensuring regions remain temporally consistent, propose strategies to overcome them, and introduce an efficient numerical scheme to perform the partition directly on the pixel grid, without the need for superpixelization or other preprocessing steps.

[1]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[2]  Michael J. Black,et al.  Layered segmentation and optical flow estimation over time , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Stefano Soatto,et al.  Detachable Object Detection: Segmentation and Depth Ordering from Short-Baseline Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  John W. Fisher,et al.  Topology-Constrained Layered Tracking with Latent Flow , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[7]  Eric L. Miller,et al.  Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[8]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Stefano Soatto,et al.  Sparse Occlusion Detection with Optical Flow , 2012, International Journal of Computer Vision.

[10]  A. Criminisi,et al.  Bilayer Segmentation of Live Video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  F. Durand,et al.  Flash photography enhancement via intrinsic relighting , 2004, ACM Trans. Graph..

[12]  Andrew Zisserman,et al.  Learning Layered Motion Segmentations of Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[14]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[16]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[17]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Calvin C. Zhao Critical Review : Contour Detection and Hierarchical Image Segmentation , 2015 .

[19]  Fernand Meyer,et al.  Motion Segmenation and Depth Ordering Based on Morphological Segmentation , 1998, ECCV.

[20]  A. Pentland,et al.  Robust estimation of a multi-layered motion representation , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[21]  Irfan A. Essa,et al.  Motion based decompositing of video , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Daniel Cremers,et al.  A Coding-Cost Framework for Super-Resolution Motion Layer Decomposition , 2012, IEEE Transactions on Image Processing.

[23]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[24]  Paul Smith,et al.  Layered motion segmentation and depth ordering by tracking edges , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[26]  Alan L. Yuille,et al.  Efficient Multilevel Brain Tumor Segmentation With Integrated Bayesian Model Classification , 2008, IEEE Transactions on Medical Imaging.

[27]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Stefano Soatto,et al.  Dynamic Shape and Appearance Modeling via Moving and Deforming Layers , 2005, International Journal of Computer Vision.

[29]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[30]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[31]  Ganesh Sundaramoorthi,et al.  Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, ACM Trans. Graph..

[34]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Bernt Schiele,et al.  Video Segmentation with Superpixels , 2012, ACCV.