论文信息 - Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration

Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration

Video segmentation has become an important and active research area with a large diversity of proposed approaches. Graph-based methods, enabling top performance on recent benchmarks, usually focus on either obtaining a precise similarity graph or designing efficient graph cutting strategies. However, these two components are often conducted in two separated steps, and thus the obtained similarity graph may not be the optimal one for segmentation and this may lead to suboptimal results. In this paper, we propose a novel framework, joint graph learning and video segmentation (JGLVS)}, which learns the similarity graph and video segmentation simultaneously. JGLVS learns the similarity graph by assigning adaptive neighbors for each vertex based on multiple cues (appearance, motion, boundary and spatial information). Meanwhile, the new rank constraint is imposed to the Laplacian matrix of the similarity graph, such that the connected components in the resulted similarity graph are exactly equal to the number of segmentations. Furthermore, JGLVS can automatically weigh multiple cues and calibrate the pairwise distance of superpixels based on their topology structures. Most noticeably, empirical results on the challenging dataset VSB100 show that JGLVS achieves promising performance on the benchmark dataset which outperforms the state-of-the-art by up to 11% for the BPR metric.

[1] Gregory J. Zelinsky,et al. Efficient Video Segmentation Using Parametric Graph Partitioning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] Hujun Bao,et al. Spatio-Temporal Video Segmentation of Static Scenes and Its Applications , 2015, IEEE Transactions on Multimedia.

[3] Jitendra Malik,et al. Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[4] Thomas Brox,et al. Higher order motion models and spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[6] Thomas Brox,et al. Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7] Jitendra Malik,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[8] Lei Guo,et al. Semantic Segmentation based on Stacked Discriminative Autoencoders and Context-Constrained Weakly Supervised Learning , 2015, ACM Multimedia.

[9] Irfan A. Essa,et al. Geometric Context from Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Eric L. Miller,et al. Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[11] Charless C. Fowlkes,et al. Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Nicu Sebe,et al. Graph-without-cut: An Ideal Graph Learning for Image Segmentation , 2016, AAAI.

[13] Thomas Brox,et al. Spectral Graph Reduction for Efficient Image and Streaming Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Feiping Nie,et al. Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[15] Jing Liu,et al. Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks , 2015, ACM Multimedia.

[16] Alan L. Yuille,et al. Efficient Multilevel Brain Tumor Segmentation With Integrated Bayesian Model Classification , 2008, IEEE Transactions on Medical Imaging.

[17] René Vidal,et al. Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees , 2013, 2013 IEEE International Conference on Computer Vision.

[18] Shuicheng Yan,et al. SOLD: Sub-optimal low-rank decomposition for efficient video segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Jitendra Malik,et al. From contours to regions: An empirical evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Chenliang Xu,et al. Streaming Hierarchical Video Segmentation , 2012, ECCV.

[21] Xuming He,et al. Multiclass semantic video segmentation with object-level active inference , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Sylvain Paris,et al. Edge-Preserving Smoothing and Mean-Shift Segmentation of Video Streams , 2008, ECCV.

[23] Michael Felsberg,et al. Fast Segmentation of Sparse 3D Point Trajectories Using Group Theoretical Invariants , 2014, ACCV.

[24] Longin Jan Latecki,et al. Maximum weight cliques with mutex constraints for video object segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Bernt Schiele,et al. Classifier based graph construction for video segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Thomas Brox,et al. Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27] Mei Han,et al. Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28] Nicu Sebe,et al. Optimal graph learning with partial tags and multiple features for image and video annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Thomas Brox,et al. A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[30] Thomas Brox,et al. Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[31] K. Fan. On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[32] Bingbing Ni,et al. Video Object Segmentation Via Dense Trajectories , 2015, IEEE Transactions on Multimedia.

[33] William Brendel,et al. Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34] Katerina Fragkiadaki,et al. Detection free tracking: Exploiting motion and topology for segmenting and tracking under entanglement , 2011, CVPR 2011.

[35] K. Fan. On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[36] Vladimir Pavlovic,et al. Multi-cue Structure Preserving MRF for Unconstrained Video Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37] Zi Huang,et al. Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[38] Hongliang Li,et al. Object Segmentation from Long Video Sequences , 2015, ACM Multimedia.

[39] Bohyung Han,et al. Tracking-by-Segmentation with Online Gradient Boosting Decision Tree , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40] Bernt Schiele,et al. Video Segmentation with Superpixels , 2012, ACCV.

[41] Feiping Nie,et al. The Constrained Laplacian Rank Algorithm for Graph-Based Clustering , 2016, AAAI.

[42] Heng Tao Shen,et al. Hashing on Nonlinear Manifolds , 2014, IEEE Transactions on Image Processing.