论文信息 - Semantic Co-segmentation in Videos

Semantic Co-segmentation in Videos

Discovering and segmenting objects in videos is a challenging task due to large variations of objects in appearances, deformed shapes and cluttered backgrounds. In this paper, we propose to segment objects and understand their visual semantics from a collection of videos that link to each other, which we refer to as semantic co-segmentation. Without any prior knowledge on videos, we first extract semantic objects and utilize a tracking-based approach to generate multiple object-like tracklets across the video. Each tracklet maintains temporally connected segments and is associated with a predicted category. To exploit rich information from other videos, we collect tracklets that are assigned to the same category from all videos, and co-select tracklets that belong to true objects by solving a submodular function. This function accounts for object properties such as appearances, shapes and motions, and hence facilitates the co-segmentation process. Experiments on three video object segmentation datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

[1] Stephen Lin,et al. Object-Based Multiple Foreground Video Co-segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Michael J. Black,et al. Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jean Ponce,et al. Unsupervised Object Discovery and Tracking in Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Ming-Hsuan Yang,et al. Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[6] Xiao Liu,et al. Weakly Supervised Multiclass Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Ce Liu,et al. Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Horst Bischof,et al. Hough-based tracking of non-rigid objects , 2011, 2011 International Conference on Computer Vision.

[9] Vladimir Kolmogorov,et al. An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Nanning Zheng,et al. Video Object Discovery and Co-Segmentation with Extremely Weak Supervision , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Vittorio Ferrari,et al. Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[12] Jiaming Guo,et al. Consistent Foreground Co-segmentation , 2014, ACCV.

[13] Cordelia Schmid,et al. Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Mubarak Shah,et al. Video Object Co-segmentation by Regulated Maximum Weight Cliques , 2014, ECCV.

[15] Fei-Fei Li,et al. Discriminative Segment Annotation in Weakly Labeled Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Mario Fritz,et al. Multi-class Video Co-segmentation with a Generative Multi-video Model , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Vladimir Kolmogorov,et al. Object cosegmentation , 2011, CVPR 2011.

[18] Yong Jae Lee,et al. Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[19] Xinlei Chen,et al. Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Brendan J. Frey,et al. FLoSS: Facility location for subspace segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21] James M. Rehg,et al. Weakly Supervised Learning of Object Segmentations from Web-Scale Video , 2012, ECCV Workshops.

[22] Thomas Brox,et al. Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23] Jitendra Malik,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[24] Nikos Paragios,et al. Unsupervised co-segmentation through region matching , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Cordelia Schmid,et al. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Ling Shao,et al. Submodular Object Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Kristen Grauman,et al. Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[28] Fei-Fei Li,et al. Co-localization in Real-World Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Joan Serrat,et al. Video Co-segmentation , 2012, ACCV.

[30] Michael J. Black,et al. Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[32] Jean Ponce,et al. Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Chen Wang,et al. Semantic object segmentation via detection in weakly labeled video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Mubarak Shah,et al. Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35] James M. Rehg,et al. Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.