Video SnapCut: robust video object cutout using localized classifiers

Although tremendous success has been achieved for interactive object cutout in still images, accurately extracting dynamic objects in video remains a very challenging problem. Previous video cutout systems present two major limitations: (1) reliance on global statistics, thus lacking the ability to deal with complex and diverse scenes; and (2) treating segmentation as a global optimization, thus lacking a practical workflow that can guarantee the convergence of the systems to the desired results. We present Video SnapCut, a robust video object cutout system that significantly advances the state-of-the-art. In our system segmentation is achieved by the collaboration of a set of local classifiers, each adaptively integrating multiple local image features. We show how this segmentation paradigm naturally supports local user editing and propagates them across time. The object cutout system is completed with a novel coherent video matting technique. A comprehensive evaluation and comparison is presented, demonstrating the effectiveness of the proposed system at achieving high quality results, as well as the robustness of the system against various types of inputs.

[1]  Pushmeet Kohli,et al.  P3 & Beyond: Solving Energies with Higher Order Cliques , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Steven J. Gortler,et al.  A perception-based color space for illumination-invariant image processing , 2008, SIGGRAPH 2008.

[3]  Michael F. Cohen,et al.  Optimized Color Sampling for Robust Matting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2008 .

[5]  Markus H. Gross,et al.  Interactive 3D video editing , 2006, The Visual Computer.

[6]  Michael Cohen,et al.  Soft scissors: an interactive tool for realtime high quality matting , 2007, SIGGRAPH 2007.

[7]  Guillermo Sapiro,et al.  Interactive Image Segmentation via Adaptive Weighted Distances , 2007, IEEE Transactions on Image Processing.

[8]  Jian Sun,et al.  Video object cut and paste , 2005, SIGGRAPH 2005.

[9]  Michael Cohen,et al.  Video tooning , 2004, SIGGRAPH 2004.

[10]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Jian Sun,et al.  Lazy snapping , 2004, SIGGRAPH 2004.

[12]  Guillermo Sapiro,et al.  A Geodesic Framework for Fast Interactive Image and Video Segmentation and Matting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[14]  William A. Barrett,et al.  Intelligent scissors for image composition , 1995, SIGGRAPH.

[15]  B. Wandell Foundations of vision , 1995 .

[16]  Oleg V. Komogortsev,et al.  Predictive perceptual compression for real time video communication , 2004, MULTIMEDIA '04.

[17]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, SIGGRAPH 2005.

[18]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[19]  David Salesin,et al.  Video matting of complex scenes , 2002, SIGGRAPH.

[20]  William A. Barrett,et al.  Interactive segmentation of image volumes with Live Surface , 2007, Comput. Graph..

[21]  Edward H. Adelson,et al.  Eurographics Symposium on Rendering 2008 Scribbleboost: Adding Classification to Edge-aware Interpolation of Local Image and Video Adjustments , 2022 .

[22]  Michael F. Cohen,et al.  Image and Video Matting: A Survey , 2007, Found. Trends Comput. Graph. Vis..

[23]  Michael F. Cohen,et al.  Monocular Video Foreground/Background Segmentation by Tracking Spatial-Color Gaussian Mixture Models , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[24]  David Salesin,et al.  Keyframe-based tracking for rotoscoping and animation , 2004, SIGGRAPH 2004.