Learning Object Deformation and Motion Adaption for Semi-supervised Video Object Segmentation

We propose a novel method to solve the task of semi-supervised video object segmentation in this paper, where the mask annotation is only given at the first frame of the video sequence. A mask-propagation-based model is applied to learn the past and current information for segmentation. Besides, due to the scarcity of training data, image/mask pairs that model object deformation and shape variance are generated for the training phase. In addition, we generate the key flips between two adjacent frames for motion adaptation. The method works in an end-to-end way, without any online fine-tuning on test videos. Extensive experiments demonstrate that our method achieves competitive performance against state-of-the-art methods on benchmark datasets, covering cases with single object or multiple objects. We also conduct extensive ablation experiments to analyze the effectiveness of our proposed method.

[1]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[2]  R. Venkatesh Babu,et al.  SeamSeg: Video Object Segmentation Using Patch Seams , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Wei Liu,et al.  CNN in MRF: Video Object Segmentation via Inference in a CNN-Based Higher-Order Spatio-Temporal MRF , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[5]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[7]  Jian Yang,et al.  Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition , 2018, AAAI.

[8]  Ming-Hsuan Yang,et al.  Fast and Accurate Online Video Object Segmentation via Tracking Parts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Aggelos K. Katsaggelos,et al.  Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Bastian Leibe,et al.  PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation , 2018, ACCV.

[15]  Guosheng Lin,et al.  MoNet: Deep Motion Exploitation for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Bin Zhao,et al.  HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Song-Chun Zhu,et al.  A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  K.-K. Maninis,et al.  Video Object Segmentation without Temporal Information , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ning Xu,et al.  YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark , 2018, ArXiv.

[20]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Kenneth D. Forbus,et al.  Action Recognition From Skeleton Data via Analogical Generalization Over Qualitative Representations , 2018, AAAI.

[22]  Bastian Leibe,et al.  Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.

[23]  Yizhou Wang,et al.  Video Object Segmentation by Learning Location-Sensitive Embeddings , 2018, ECCV.

[24]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Kalyan Sunkavalli,et al.  Fast Video Object Segmentation by Reference-Guided Mask Propagation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Bernt Schiele,et al.  Classifier based graph construction for video segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Luc Van Gool,et al.  Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Ming-Hsuan Yang,et al.  SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Chang-Su Kim,et al.  Online Video Object Segmentation via Convolutional Trident Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Xiaoxiao Li,et al.  Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation , 2018, ECCV.

[37]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Michael J. Black,et al.  Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).