Robust video segment proposals with painless occlusion handling

We propose a robust algorithm to generate video segment proposals. The proposals generated by our method can start from any frame in the video and are robust to complete occlusions. Our method does not assume specific motion models and even has a limited capability to generalize across videos. We build on our previous least squares tracking framework, where image segment proposals are generated and tracked using learned appearance models. The innovation in our new method lies in the use of two efficient moves, the merge move and free addition, to efficiently start segments from any frame and track them through complete occlusions, without much additional computation. Segment size interpolation is used for effectively detecting occlusions. We propose a new metric for evaluating video segment proposals on the challenging VSB-100 benchmark and present state-of-the-art results. Preliminary results are also shown for the potential use of our framework to track segments across different videos.

[1]  Larry S. Davis,et al.  Probabilistic framework for segmenting people under occlusion , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[2]  Hai Tao,et al.  A background layer model for object tracking through occlusion , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Xin Li,et al.  Contour-based object tracking with occlusion handling in video acquired using mobile cameras , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  L. Davis,et al.  M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene , 2003, International Journal of Computer Vision.

[5]  Yan Huang,et al.  Tracking multiple objects through occlusions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Larry S. Davis,et al.  Multi-camera Tracking and Segmentation of Occluded People on Ground Plane Using Search-Guided Particle Filtering , 2006, ECCV.

[7]  Dimitris N. Metaxas,et al.  ]Video object segmentation by hypergraph cut , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[11]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Eric L. Miller,et al.  Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[13]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Ivan Laptev,et al.  Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[15]  Cristian Sminchisescu,et al.  Object Recognition by Sequential Figure-Ground Ranking , 2011, International Journal of Computer Vision.

[16]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[17]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[18]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[19]  Horst Bischof,et al.  Hough-based tracking of non-rigid objects , 2011, 2011 International Conference on Computer Vision.

[20]  Longin Jan Latecki,et al.  Maximum weight cliques with mutex constraints for video object segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Bodo Rosenhahn,et al.  Multi-scale Clustering of Frame-to-Frame Correspondences for Motion Segmentation , 2012, ECCV.

[22]  Bohyung Han,et al.  Online Video Segmentation by Bayesian Split-Merge Clustering , 2012, ECCV.

[23]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[24]  Katerina Fragkiadaki,et al.  Video segmentation by tracing discontinuities in a trajectory embedding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Philippe Salembier,et al.  Hierarchical Video Representation with Trajectory Binary Partition Tree , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[29]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Santiago Manen,et al.  Online Video SEEDS for Temporal Window Objectness , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Cristian Sminchisescu,et al.  Video Object Segmentation by Salient Segment Chain Composition , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[34]  René Vidal,et al.  Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Ignas Budvytis,et al.  Mixture of Trees Probabilistic Graphical Model for Video Segmentation , 2013, International Journal of Computer Vision.

[37]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[38]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[40]  James M. Rehg,et al.  RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Thomas Brox,et al.  Spectral Graph Reduction for Efficient Image and Streaming Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[44]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[45]  Nanning Zheng,et al.  Video Object Discovery and Co-Segmentation with Extremely Weak Supervision , 2017, IEEE Trans. Pattern Anal. Mach. Intell..