Leveraging Occlusions for Causal Video Object Segmentation

This thesis describes a framework leveraging occlusions as a cue for detecting objects and accurately localizing their boundaries throughout the course of a video. Triggered by the motion of objects in the scene, occlusions provide coarse knowledge of the spatial relationship of objects with respect to the viewer. While effective for detecting objects when motion is sufficient, we explore ways to reliably detect and track objects when motion is inadequate or difficult to estimate.In the first half, we incorporate semantic classifiers to provide cues when occlusions are weak, and observe occlusion and appearance information to be mutually beneficial, yielding results more resilient to failures of the component systems acting alone. Our system is evaluated on the semantic segmentation task. In the latter half, we drop semantics and instead devise a causal framework integrating segmentation results and occlusion cues from frames processed in the past. So long as objects move sufficiently with respect to the viewer at some point, they will be detected and subsequently tracked for the rest of the video. We evaluated our approach on the video object segmentation problem. The resulting system has the capability to automatically discover objects from occlusions in video and track their shapes as they evolve over time. Coarse depth is provided as a byproduct and the assignment of semantic category labels can be integrated in a natural way.

[1]  Jitendra Malik,et al.  Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[3]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Paul Smith,et al.  Layered motion segmentation and depth ordering by tracking edges , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  A. Krener,et al.  Nonlinear controllability and observability , 1977 .

[7]  Cristian Sminchisescu,et al.  Efficient Closed-Form Solution to Generalized Boundary Detection , 2012, ECCV.

[8]  Stefano Soatto,et al.  Occlusion Detection and Motion Estimation with Convex Optimization , 2010, NIPS.

[9]  Takeo Kanade,et al.  Real-time combined 2D+3D active appearance models , 2004, CVPR 2004.

[10]  Axel Pinz,et al.  Object Localization with Boosting and Weak Supervision for Generic Object Recognition , 2005, SCIA.

[11]  Fernand Meyer,et al.  Motion Segmenation and Depth Ordering Based on Morphological Segmentation , 1998, ECCV.

[12]  Stephen Gould,et al.  Region-based Segmentation and Object Detection , 2009, NIPS.

[13]  A. Pentland,et al.  Robust estimation of a multi-layered motion representation , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[14]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[15]  Bernt Schiele,et al.  A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes , 2008, ECCV.

[16]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[17]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[18]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[19]  Bastian Leibe,et al.  Joint 2D-3D temporally consistent semantic segmentation of street scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Stefano Soatto,et al.  Sparse Occlusion Detection with Optical Flow , 2012, International Journal of Computer Vision.

[21]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[24]  Andrew Zisserman,et al.  Learning Layered Motion Segmentations of Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Daniel Cremers,et al.  Motion Competition: A variational framework for piecewise parametric motion segmentation , 2005 .

[26]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, SIGGRAPH 2009.

[27]  Luc Van Gool,et al.  Segmentation-Based Urban Traffic Scene Understanding , 2009, BMVC.

[28]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[29]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[30]  Nahum Kiryati,et al.  Piecewise-Smooth Dense Optical Flow via Level Sets , 2006, International Journal of Computer Vision.

[31]  Jean-Philippe Pons,et al.  Generalized Gradients: Priors on Minimization Flows , 2007, International Journal of Computer Vision.

[32]  Frédo Durand,et al.  Flash photography enhancement via intrinsic relighting , 2004, SIGGRAPH 2004.

[33]  Guillermo Sapiro,et al.  Generalized Newton-Type Methods for Energy Formulations in Image Processing , 2009, SIAM J. Imaging Sci..

[34]  Svetlana Lazebnik,et al.  Understanding scenes on many levels , 2011, 2011 International Conference on Computer Vision.

[35]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Bernt Schiele,et al.  Video Segmentation with Superpixels , 2012, ACCV.

[37]  Mario Sznaier,et al.  The Way They Move: Tracking Multiple Targets with Similar Appearance , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Michael J. Black,et al.  A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them , 2013, International Journal of Computer Vision.

[39]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[40]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  A. Criminisi,et al.  Bilayer Segmentation of Live Video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Hayit Greenspan,et al.  Finding Pictures of Objects in Large Collections of Images , 1996, Object Representation in Computer Vision.

[43]  Daniel Cremers,et al.  A Coding-Cost Framework for Super-Resolution Motion Layer Decomposition , 2012, IEEE Transactions on Image Processing.

[44]  Stefano Soatto,et al.  Detachable Object Detection with Efficient Model Selection , 2011, EMMCVPR.

[45]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  A. Phelps On constructing nonlinear observers , 1991 .

[47]  Alan L. Yuille,et al.  Efficient Multilevel Brain Tumor Segmentation With Integrated Bayesian Model Classification , 2008, IEEE Transactions on Medical Imaging.

[48]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  B. S. Manjunath,et al.  Probabilistic occlusion boundary detection on spatio-temporal lattices , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[50]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[51]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[52]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[53]  Irfan A. Essa,et al.  Motion based decompositing of video , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[54]  Alain Trouvé,et al.  Computing Large Deformation Metric Mappings via Geodesic Flows of Diffeomorphisms , 2005, International Journal of Computer Vision.

[55]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Yair Weiss,et al.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Camillo Gentile,et al.  Segmentation for robust tracking in the presence of severe occlusion , 2001, IEEE Transactions on Image Processing.

[60]  Brian Taylor,et al.  Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework , 2013, EMMCVPR.

[61]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[62]  Anthony J. Yezzi,et al.  Sobolev Active Contours , 2005, VLSM.

[63]  Ruigang Yang,et al.  Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[64]  Stefano Soatto,et al.  Detachable Object Detection: Segmentation and Depth Ordering from Short-Baseline Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Michael J. Black,et al.  A Fully-Connected Layered Model of Foreground and Background Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Marc Pollefeys,et al.  Learning a Confidence Measure for Optical Flow , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  John W. Fisher,et al.  Topology-Constrained Layered Tracking with Latent Flow , 2013, 2013 IEEE International Conference on Computer Vision.

[68]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[69]  Daniel Cremers,et al.  Fast Joint Estimation of Silhouettes and Dense 3D Geometry from Multiple Images , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[71]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[72]  Brian Taylor,et al.  Causal video object segmentation from persistence of occlusions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[74]  Ganesh Sundaramoorthi,et al.  Shape Tracking with Occlusions via Coarse-to-Fine Region-Based Sobolev Descent , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[76]  Eric L. Miller,et al.  Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[77]  Alan L. Yuille,et al.  Occlusion Boundary Detection Using Pseudo-depth , 2010, ECCV.

[78]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[80]  Martial Hebert,et al.  Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning , 2009, International Journal of Computer Vision.

[81]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[82]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[83]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[85]  Jana Kosecka,et al.  Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[86]  Horst Bischof,et al.  Online 3D reconstruction using convex optimization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[87]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[88]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[89]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[90]  Yi Yang,et al.  Layered Object Models for Image Segmentation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Yasuyuki Matsushita,et al.  Motion detail preserving optical flow estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[92]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[93]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[94]  Stefano Soatto,et al.  Self-Occlusions and Disocclusions in Causal Video Object Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[95]  Mubarak Shah,et al.  Motion layer extraction in the presence of occlusion using graph cuts , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96]  Alex Pentland,et al.  Cooperative Robust Estimation Using Layers of Support , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[97]  Thomas Brox,et al.  Variational Motion Segmentation with Level Sets , 2006, ECCV.

[98]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[99]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[100]  Ganesh Sundaramoorthi,et al.  Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[101]  J. Weickert,et al.  Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods , 2005 .

[102]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[103]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[104]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[105]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[106]  Tomaso A. Poggio,et al.  Motion Field and Optical Flow: Qualitative Properties , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  Stefano Soatto,et al.  Dynamic Shape and Appearance Modeling Via Moving and Deforming Layers , 2005, EMMCVPR.

[108]  Michael J. Black,et al.  Layered segmentation and optical flow estimation over time , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.