Semantic Single Video Segmentation with Robust Graph Representation

Graph-based video segmentation has demonstrated its influential impact from recent works. However, most of the existing approaches fail to make a semantic segmentation of the foreground objects, i.e. all the segmented objects are treated as one class. In this paper, we propose an approach to semantically segment the multi-class foreground objects from a single video sequence. To achieve this, we firstly generate a set of proposals for each frame and score them based on motion and appearance features. With these scores, the similarities between each proposal are measured. To tackle the vulnerability of the graph-based model, low-rank representation with l2,1-norm regularizer outlier detection is proposed to discover the intrinsic structure among proposals. With the "clean" graph representation, objects of different classes are more likely to be grouped into separated clusters. Two open public datasets MOViCS and ObMiC are used for evaluation under both intersection-over-union and F-measure metrics. The superior results compared with the state-of-the-arts demonstrate the effectiveness of the proposed method.

[1]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[3]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[5]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[6]  Chenliang Xu,et al.  Flattening Supervoxel Hierarchies by the Uniform Entropy Slice , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[8]  Philippe Salembier,et al.  Hierarchical Video Representation with Trajectory Binary Partition Tree , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Bernt Schiele,et al.  Video Segmentation with Superpixels , 2012, ACCV.

[10]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[11]  Longin Jan Latecki,et al.  Maximum weight cliques with mutex constraints for video object segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[13]  Jiebo Luo,et al.  Understanding Kin Relationships in a Photo , 2012, IEEE Transactions on Multimedia.

[14]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[15]  Xiaochun Cao,et al.  Video object segmentation with shortest path , 2012, ACM Multimedia.

[16]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jitendra Malik,et al.  Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Edwin Olson,et al.  Single-Cluster Spectral Graph Partitioning for Robotics Applications , 2005, Robotics: Science and Systems.

[21]  René Vidal,et al.  Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[23]  James M. Rehg,et al.  Motion Coherent Tracking with Multi-label MRF optimization , 2010, BMVC.

[24]  Shuicheng Yan,et al.  Robust and Efficient Subspace Segmentation via Least Squares Regression , 2012, ECCV.

[25]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Mario Fritz,et al.  Multi-class Video Co-segmentation with a Generative Multi-video Model , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Stephen Lin,et al.  Object-Based Multiple Foreground Video Co-segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[29]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[30]  Thomas Brox,et al.  Spectral Graph Reduction for Efficient Image and Streaming Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Mubarak Shah,et al.  Video Object Co-segmentation by Regulated Maximum Weight Cliques , 2014, ECCV.

[32]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[33]  Ming Shao,et al.  Generalized Transfer Subspace Learning Through Low-Rank Constraint , 2014, International Journal of Computer Vision.