Multilevel Model for Video Object Segmentation Based on Supervision Optimization

In this work, we present a supervised object segmentation algorithm for unconstrained video. Instead of arbitrarily picking a few frames for manual labeling, as in many existing supervised methods, the proposed method selects frames in a more reasonable manner, called supervision optimization. For this, we formulate a principled objective function by inferring the propagation error from appearance and motion clues. After this, we construct a multilevel segmentation model, which consists of low-level and high-level features. On the low level, image pixels are used for a more accurate estimation of motion and segmentation. On the high level, image segments are considered for a more semantic classification of the foreground and background. By integrating these in one segmentation graph, the result can be further improved by leveraging the knowledge from both levels. In experiments, the proposed approach is evaluated by different measures, and the results on a benchmark demonstrate the effectiveness in comparison with other state-of-the-art algorithms.

[1]  Ming-Hsuan Yang,et al.  JOTS: Joint Online Tracking and Segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hongliang Li,et al.  Video Object Segmentation via Global Consistency Aware Query Strategy , 2017, IEEE Transactions on Multimedia.

[3]  Tao Mei,et al.  Learning Deep Spatio-Temporal Dependence for Semantic Video Segmentation , 2018, IEEE Transactions on Multimedia.

[4]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Amit K. Roy-Chowdhury,et al.  Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization , 2017, IEEE Transactions on Multimedia.

[7]  Yuan F. Zheng,et al.  Transductive Video Segmentation on Tree-Structured Model , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Kristen Grauman,et al.  Active Frame Selection for Label Propagation in Videos , 2012, ECCV.

[9]  Masoud Mazloom,et al.  Conceptlets: Selective Semantics for Classifying Video Events , 2014, IEEE Transactions on Multimedia.

[10]  Ying Zhang,et al.  Efficient Summarization From Multiple Georeferenced User-Generated Videos , 2016, IEEE Transactions on Multimedia.

[11]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[12]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Enhua Wu,et al.  Robust dense reconstruction by range merging based on confidence estimation , 2016, Science China Information Sciences.

[14]  Chang-Su Kim,et al.  Primary Object Segmentation in Videos via Alternate Convex Optimization of Foreground and Background Distributions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Moncef Gabbouj,et al.  Sport Type Classification of Mobile Videos , 2014, IEEE Transactions on Multimedia.

[16]  Alexander Sorkine-Hornung,et al.  Bilateral Space Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Zhi Zhang,et al.  Animal Detection From Highly Cluttered Natural Scenes Using Spatiotemporal Object Region Proposals and Patch Verification , 2016, IEEE Transactions on Multimedia.

[18]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[19]  Bingbing Ni,et al.  Video Object Segmentation Via Dense Trajectories , 2015, IEEE Transactions on Multimedia.

[20]  Wenguan Wang,et al.  Occlusion-Aware Real-Time Object Tracking , 2017, IEEE Transactions on Multimedia.

[21]  Zenglin Xu,et al.  Efficient Convex Relaxation for Transductive Support Vector Machine , 2007, NIPS.

[22]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Junsong Yuan,et al.  Fast Appearance Modeling for Automatic Primary Video Object Segmentation , 2016, IEEE Transactions on Image Processing.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Kristen Grauman,et al.  Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[26]  Simone Palazzo,et al.  Gamifying Video Object Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ferran Marqués,et al.  Region-Based Particle Filter for Video Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Dani Lischinski,et al.  JumpCut , 2015, ACM Trans. Graph..

[29]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Stanley T. Birchfield,et al.  Adaptive fragments-based tracking of non-rigid objects using level sets , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Yong Jae Lee,et al.  Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[33]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Francesca Murabito,et al.  Superpixel-based video object segmentation using perceptual organization and location prior , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Hujun Bao,et al.  Spatio-Temporal Video Segmentation of Static Scenes and Its Applications , 2015, IEEE Transactions on Multimedia.

[38]  Markus H. Gross,et al.  Fully Connected Object Proposals for Video Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Chenliang Xu,et al.  LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing , 2015, International Journal of Computer Vision.

[40]  R. Venkatesh Babu,et al.  SeamSeg: Video Object Segmentation Using Patch Seams , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Michael J. Black,et al.  Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[44]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Enhua Wu,et al.  Efficient frame-sequential label propagation for video object segmentation , 2018, Multimedia Tools and Applications.

[47]  Thomas Brox,et al.  Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.