Efficient hierarchical graph-based video segmentation

We present an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. We also propose two novel approaches to improve the scalability of our technique: (a) a parallel out-of-core algorithm that can process volumes much larger than an in-core algorithm, and (b) a clip-based processing algorithm that divides the video into overlapping clips in time, and segments them successively while enforcing consistency. We demonstrate hierarchical segmentations on video shots as long as 40 seconds, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.

[1]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[2]  John W. Woods,et al.  Spatio-temporal adaptive 3-D Kalman filter for video , 1997, IEEE Trans. Image Process..

[3]  A. Murat Tekalp,et al.  A new motion-compensated reduced-order model Kalman filter for space-varying restoration of progressive and interlaced video , 1998, IEEE Trans. Image Process..

[4]  Ronen Basri,et al.  Fast multiscale image segmentation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[5]  Mubarak Shah,et al.  Object based segmentation of video using color, motion and spatial information , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Daniel DeMenthon,et al.  SPATIO-TEMPORAL SEGMENTATION OF VIDEO BY HIERARCHICAL MEAN SHIFT ANALYSIS , 2002 .

[7]  Adam Finkelstein,et al.  Stylized video cubes , 2002, SCA '02.

[8]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Bo Thiesson,et al.  Image and Video Segmentation by Anisotropic Kernel Mean Shift , 2004, ECCV.

[10]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[11]  Harry Shum,et al.  Video tooning , 2004, ACM Trans. Graph..

[12]  Jian Sun,et al.  Video object cut and paste , 2005, SIGGRAPH 2005.

[13]  Nebojsa Jojic,et al.  Consistent segmentation for optical flow estimation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Gareth Funka-Lea,et al.  Graph Cuts and Efficient N-D Image Segmentation , 2006, International Journal of Computer Vision.

[15]  Holger Winnemöller,et al.  Real-time video abstraction , 2006, SIGGRAPH 2006.

[16]  Jiawen Chen,et al.  Real-time edge-aware image processing with the bilateral grid , 2007, SIGGRAPH 2007.

[17]  Sylvain Paris,et al.  Edge-Preserving Smoothing and Mean-Shift Segmentation of Video Streams , 2008, ECCV.

[18]  Dimitris N. Metaxas,et al.  ]Video object segmentation by hypergraph cut , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Daniel Cremers,et al.  Anisotropic Huber-L1 Optical Flow , 2009, BMVC.

[20]  D. Freedman,et al.  Fast Mean Shift by compact density representation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, ACM Trans. Graph..

[22]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Scott Cohen,et al.  LIVEcut: Learning-based interactive video segmentation by evaluation of multiple propagated cues , 2009, 2009 IEEE 12th International Conference on Computer Vision.