Video Object Segmentation Using Global and Instance Embedding Learning

In this paper, we propose a feature embedding based video object segmentation (VOS) method which is simple, fast and effective. The current VOS task involves two main challenges: object instance differentiation and cross-frame instance alignment. Most state-of-the-art matching based VOS methods simplify this task into a binary segmentation task and tackle each instance independently. In contrast, we decompose the VOS task into two subtasks: global embedding learning that segments foreground objects of each frame in a pixel-to-pixel manner, and instance feature embedding learning that separates instances. The outputs of these two subtasks are fused to obtain the final instance masks quickly and accurately. Through using the relation among different instances per-frame as well as temporal relation across different frames, the proposed network learns to differentiate multiple instances and associate them properly in one feed-forward manner. Extensive experimental results on the challenging DAVIS[34] and Youtube-VOS [57] datasets show that our method achieves better performances than most counterparts in each case.

[1]  Chunhua Shen,et al.  BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiaojuan Qi,et al.  AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ming-Hsuan Yang,et al.  Fast and Accurate Online Video Object Segmentation via Tracking Parts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Gang Wang,et al.  Motion-Guided Cascaded Refinement Network for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Erika Lu,et al.  MAST: A Memory-Augmented Self-Supervised Tracker , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Michael J. Black,et al.  Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Alexander G. Schwing,et al.  MaskRNN: Instance Level Video Object Segmentation , 2018, NIPS.

[10]  Yu Li,et al.  Fast Video Object Segmentation using the Global Context Module , 2020, ECCV.

[11]  Yunchao Wei,et al.  Collaborative Video Object Segmentation by Foreground-Background Integration , 2020, ECCV.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Luc Van Gool,et al.  Video Object Segmentation with Episodic Graph Memory Networks , 2020, ECCV.

[14]  Hao Shen,et al.  CenterMask: Single Shot Instance Segmentation With Point Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Quoc V. Le,et al.  CondConv: Conditionally Parameterized Convolutions for Efficient Inference , 2019, NeurIPS.

[16]  Michael Felsberg,et al.  A Generative Appearance Model for End-To-End Video Object Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yizhou Wang,et al.  Video Object Segmentation by Learning Location-Sensitive Embeddings , 2018, ECCV.

[18]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[19]  Aggelos K. Katsaggelos,et al.  Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Yuning Jiang,et al.  SOLO: Segmenting Objects by Locations , 2019, ECCV.

[21]  Ping Luo,et al.  PolarMask: Single Shot Instance Segmentation With Polar Representation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[23]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[24]  Stephen Lin,et al.  A Transductive Approach for Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Ruigang Yang,et al.  Semi-Supervised Video Object Segmentation with Super-Trajectories , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Felix Järemo Lawin,et al.  Learning What to Learn for Video Object Segmentation , 2020, ECCV.

[28]  Ling Shao,et al.  RANet: Ranking Attention Network for Fast Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Bastian Leibe,et al.  FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ning Xu,et al.  Video Object Segmentation Using Space-Time Memory Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Hao Chen,et al.  Conditional Convolutions for Instance Segmentation , 2020, ECCV.

[32]  Alexander G. Schwing,et al.  VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[33]  Ling Shao,et al.  Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Ruigang Yang,et al.  Saliency-Aware Video Object Segmentation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  K.-K. Maninis,et al.  Video Object Segmentation without Temporal Information , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Wei Liu,et al.  MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Wei Liu,et al.  CNN in MRF: Video Object Segmentation via Inference in a CNN-Based Higher-Order Spatio-Temporal MRF , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Sanyuan Zhao,et al.  Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[39]  Michael Felsberg,et al.  Learning Fast and Robust Target Models for Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xiaoxiao Li,et al.  Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation , 2018, ECCV.

[41]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[42]  Euntai Kim,et al.  Kernelized Memory Network for Video Object Segmentation , 2020, ECCV.

[43]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[44]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Bastian Leibe,et al.  Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.

[46]  Radomír Mech,et al.  Unsupervised Video Object Segmentation with Joint Hotspot Tracking , 2020, ECCV.

[47]  Bastian Leibe,et al.  PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation , 2018, ACCV.

[48]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Steven C. H. Hoi,et al.  Paying Attention to Video Object Pattern Understanding , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Luc Van Gool,et al.  Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Ling Shao,et al.  See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Thomas Brox,et al.  Lucid Data Dreaming for Video Object Segmentation , 2017, International Journal of Computer Vision.

[53]  Kalyan Sunkavalli,et al.  Fast Video Object Segmentation by Reference-Guided Mask Propagation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[55]  Ning Xu,et al.  YouTube-VOS: Sequence-to-Sequence Video Object Segmentation , 2018, ECCV.

[56]  Sanja Fidler,et al.  DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Tianjia Shao,et al.  EmbedMask: Embedding Coupling for One-stage Instance Segmentation , 2019, ArXiv.

[58]  Jiebo Luo,et al.  Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  R. Venkatesh Babu,et al.  SeamSeg: Video Object Segmentation Using Patch Seams , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.