Video segmentation by tracing discontinuities in a trajectory embedding

Our goal is to segment a video sequence into moving objects and the world scene. In recent work, spectral embedding of point trajectories based on 2D motion cues accumulated from their lifespans, has shown to outperform factorization and per frame segmentation methods for video segmentation. The scale and kinematic nature of the moving objects and the background scene determine how close or far apart trajectories are placed in the spectral embedding. Such density variations may confuse clustering algorithms, causing over-fragmentation of object interiors. Therefore, instead of clustering in the spectral embedding, we propose detecting discontinuities of embedding density between spatially neighboring trajectories. Detected discontinuities are strong indicators of object boundaries and thus valuable for video segmentation. We propose a novel embedding discretization process that recovers from over-fragmentations by merging clusters according to discontinuity evidence along inter-cluster boundaries. For segmenting articulated objects, we combine motion grouping cues with a center-surround saliency operation, resulting in “context-aware”, spatially coherent, saliency maps. Figure-ground segmentation obtained from saliency thresholding, provides object connectedness constraints that alter motion based trajectory affinities, by keeping articulated parts together and separating disconnected in time objects. Finally, we introduce Gabriel graphs as effective per frame superpixel maps for converting trajectory clustering to dense image segmentation. Gabriel edges bridge large contour gaps via geometric reasoning without over-segmenting coherent image regions. We present experimental results of our method that outperform the state-of-the-art in challenging motion segmentation datasets.

[1]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[4]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Katerina Fragkiadaki,et al.  Detection free tracking: Exploiting motion and topology for segmenting and tracking under entanglement , 2011, CVPR 2011.

[6]  D. Matula,et al.  Properties of Gabriel Graphs Relevant to Geographic Variation Research and the Clustering of Points in the Plane , 2010 .

[7]  Deva Ramanan,et al.  Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces , 2010, ECCV.

[8]  Takeo Kanade,et al.  A multi-body factorization method for motion analysis , 1995, Proceedings of IEEE International Conference on Computer Vision.

[9]  Jitendra Malik,et al.  Scale-invariant contour completion using conditional random fields , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Laurence Commissaire divisionnaire,et al.  Mid-level Cues Improve Boundary Detection , 2005 .

[11]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[12]  Roberto Cipolla,et al.  Unsupervised Bayesian Detection of Independent Motion in Crowds , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[14]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[16]  Martial Hebert,et al.  Learning to Find Object Boundaries Using Motion Cues , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Esa Rahtu,et al.  Segmenting Salient Objects from Images and Videos , 2010, ECCV.

[18]  Jitendra Malik,et al.  Using contours to detect and localize junctions in natural images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[20]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[21]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Patrick Pérez,et al.  Clustering Point Trajectories with Various Life-Spans , 2009, 2009 Conference for Visual Media Production.

[23]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.

[25]  René Vidal,et al.  Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.