Object Detection, Tracking, and Motion Segmentation for Object-level Video Segmentation

We present an approach for object segmentation in videos that combines frame-level object detection with concepts from object tracking and motion segmentation. The approach extracts temporally consistent object tubes based on an off-the-shelf detector. Besides the class label for each tube, this provides a location prior that is independent of motion. For the final video segmentation, we combine this information with motion cues. The method overcomes the typical problems of weakly supervised/unsupervised video segmentation, such as scenes with no motion, dominant camera motion, and objects that move as a unit. In contrast to most tracking methods, it provides an accurate, temporally consistent segmentation of each object. We report results on four video segmentation datasets: YouTube Objects, SegTrackv2, egoMotion, and FBMS.

[1]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Cordelia Schmid,et al.  Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Thomas Brox,et al.  Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[5]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[7]  Cordelia Schmid,et al.  Occlusion and Motion Reasoning for Long-Term Tracking , 2014, ECCV.

[8]  Stefano Soatto,et al.  Self-Occlusions and Disocclusions in Causal Video Object Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Chen Wang,et al.  Semantic object segmentation via detection in weakly labeled video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ivan Laptev,et al.  Instance-Level Video Segmentation from Object Tracks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Junsong Yuan,et al.  Fast Appearance Modeling for Automatic Primary Video Object Segmentation , 2016, IEEE Transactions on Image Processing.

[13]  Fei-Fei Li,et al.  Discriminative Segment Annotation in Weakly Labeled Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Luc Van Gool,et al.  Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Kristen Grauman,et al.  Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[18]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Nikos Komodakis,et al.  Approximate Labeling via Graph Cuts Based on Linear Programming , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[22]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[23]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Tinghuai Wang,et al.  Primary object discovery and segmentation in videos via graph-based transductive inference , 2016, Comput. Vis. Image Underst..

[25]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  James M. Rehg,et al.  Weakly Supervised Learning of Object Segmentations from Web-Scale Video , 2012, ECCV Workshops.

[27]  Ignas Budvytis,et al.  Mixture of Trees Probabilistic Graphical Model for Video Segmentation , 2013, International Journal of Computer Vision.

[28]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Kristen Grauman,et al.  Active Frame Selection for Label Propagation in Videos , 2012, ECCV.

[30]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.