Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

In this paper, we propose a novel approach to extract primary object segments in videos in the `object proposal' domain. The extracted primary object regions are then used to build object models for optimized video segmentation. The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video. We exploit the fact that, in general, objects are spatially cohesive and characterized by locally smooth motion trajectories, to extract the primary object from the set of all available proposals based on motion, appearance and predicted-shape similarity across frames. Second, the DAG is initialized with an enhanced object proposal set where motion based proposal predictions (from adjacent frames) are used to expand the set of object proposals for a particular frame. Last, the paper presents a motion scoring function for selection of object proposals that emphasizes high optical flow gradients at proposal boundaries to discriminate between moving objects and the background. The proposed approach is evaluated using several challenging benchmark videos and it outperforms both unsupervised and supervised state-of-the-art methods.

[1]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Eric L. Miller,et al.  Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[3]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[4]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, ACM Trans. Graph..

[5]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[7]  Jitendra Malik,et al.  Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[8]  Takeo Kanade,et al.  Background Subtraction for Freely Moving Cameras , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[10]  Bo Thiesson,et al.  Image and Video Segmentation by Anisotropic Kernel Mean Shift , 2004, ECCV.

[11]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[12]  Stanley T. Birchfield,et al.  Adaptive fragments-based tracking of non-rigid objects using level sets , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Dimitris N. Metaxas,et al.  ]Video object segmentation by hypergraph cut , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Scott Cohen,et al.  LIVEcut: Learning-based interactive video segmentation by evaluation of multiple propagated cues , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Éva Tardos,et al.  Algorithm design , 2005 .

[21]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[22]  Pieter Peers,et al.  SubEdit: a representation for editing measured heterogeneous subsurface scattering , 2009, SIGGRAPH 2009.

[23]  James M. Rehg,et al.  Motion Coherent Tracking with Multi-label MRF optimization , 2010, BMVC.

[24]  Longin Jan Latecki,et al.  Maximum weight cliques with mutex constraints for video object segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.