Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration

This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation. Unlike previous practices that focus on exploring the embedding learning of foreground object (s), we consider background should be equally treated. Thus, we propose a Collaborative video object segmentation by Foreground-Background Integration (CFBI) approach. CFBI separates the feature embedding into the foreground object region and its corresponding background region, implicitly promoting them to be more contrastive and improving the segmentation results accordingly. Moreover, CFBI performs both pixel-level matching processes and instance-level attention mechanisms between the reference and the predicted sequence, making CFBI robust to various object scales. Based on CFBI, we introduce a multi-scale matching structure and propose an Atrous Matching strategy, resulting in a more robust and efficient framework, CFBI+. We conduct extensive experiments on two popular benchmarks, i.e., DAVIS and YouTube-VOS. Without applying any simulated data for pre-training, our CFBI+ achieves the performance (J&F) of 82.9% and 82.8%, outperforming all the other state-of-the-art methods. Code: this https URL.

[1]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  Yuchen Fan,et al.  Video Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Luc Van Gool,et al.  Video Object Segmentation with Episodic Graph Memory Networks , 2020, ECCV.

[6]  Richard Kronland-Martinet,et al.  A real-time algorithm for signal analysis with the help of the wavelet transform , 1989 .

[7]  Bastian Leibe,et al.  FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sanja Fidler,et al.  Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yunchao Wei,et al.  Memory Aggregation Networks for Efficient Interactive Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  King Ngi Ngan,et al.  Video Segmentation and Its Applications , 2011 .

[11]  Li Xu,et al.  Hierarchical Image Saliency Detection on Extended CSSD , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[13]  Euntai Kim,et al.  Kernelized Memory Network for Video Object Segmentation , 2020, ECCV.

[14]  Yunchao Wei,et al.  Collaborative Video Object Segmentation by Foreground-Background Integration , 2020, ECCV.

[15]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[17]  Michael Felsberg,et al.  Learning What to Learn for Video Object Segmentation , 2020, ECCV.

[18]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xinggang Wang,et al.  Motion-Guided Spatial Time Attention for Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[21]  Ning Xu,et al.  YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark , 2018, ArXiv.

[22]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Michael Felsberg,et al.  A Generative Appearance Model for End-To-End Video Object Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Guosheng Lin,et al.  MoNet: Deep Motion Exploitation for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[28]  Alexander G. Schwing,et al.  VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[29]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[30]  Ning Xu,et al.  Video Object Segmentation Using Space-Time Memory Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Ning Xu,et al.  Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bastian Leibe,et al.  Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.

[33]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ming-Hsuan Yang,et al.  Fast and Accurate Online Video Object Segmentation via Tracking Parts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Aggelos K. Katsaggelos,et al.  Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[37]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[38]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[39]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Luc Van Gool,et al.  Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Pengfei Xiong,et al.  Enhanced Memory Network for Video Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yunchao Wei,et al.  Dual Embedding Learning for Video Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[44]  Kalyan Sunkavalli,et al.  Fast Video Object Segmentation by Reference-Guided Mask Propagation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[47]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[48]  Bastian Leibe,et al.  BoLTVOS: Box-Level Tracking for Video Object Segmentation , 2019, ArXiv.

[49]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[50]  Yi Yang,et al.  Gated Channel Transformation for Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).