论文信息 - Kernelized Memory Network for Video Object Segmentation

Kernelized Memory Network for Video Object Segmentation

Semi-supervised video object segmentation (VOS) is a task that involves predicting a target object in a video when the ground truth segmentation mask of the target object is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising solution for semi-supervised VOS. However, an important point is overlooked when applying STM to VOS. The solution (STM) is non-local, but the problem (VOS) is predominantly local. To solve the mismatch between STM and VOS, we propose a kernelized memory network (KMN). Before being trained on real videos, our KMN is pre-trained on static images, as in previous works. Unlike in previous works, we use the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction. The proposed KMN surpasses the state-of-the-art on standard benchmarks by a significant margin (+5% on DAVIS 2017 test-dev set). In addition, the runtime of KMN is 0.12 seconds per frame on the DAVIS 2016 validation set, and the KMN rarely requires extra computation, when compared with STM.

[1] Miriam Bellver,et al. RVOS: End-To-End Recurrent Network for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Luc Van Gool,et al. Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Aggelos K. Katsaggelos,et al. Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Wei Liu,et al. CNN in MRF: Video Object Segmentation via Inference in a CNN-Based Higher-Order Spatio-Temporal MRF , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Xiaoxiao Li,et al. Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation , 2018, ECCV.

[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8] Luc Van Gool,et al. The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[9] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10] Ning Xu,et al. YouTube-VOS: Sequence-to-Sequence Video Object Segmentation , 2018, ECCV.

[11] Bernt Schiele,et al. Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Jingdong Wang,et al. Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[13] Ning Xu,et al. Video Object Segmentation Using Space-Time Memory Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Yizhou Wang,et al. Video Object Segmentation by Learning Location-Sensitive Embeddings , 2018, ECCV.

[15] Alexander G. Schwing,et al. MaskRNN: Instance Level Video Object Segmentation , 2018, NIPS.

[16] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[17] Yong Jae Lee,et al. Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Jean Ponce,et al. SFNet: Learning Object-Aware Semantic Correspondence , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Bastian Leibe,et al. FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Shi-Min Hu,et al. Global contrast based salient region detection , 2011, CVPR 2011.

[21] Li Xu,et al. Hierarchical Image Saliency Detection on Extended CSSD , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Jason Weston,et al. Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[23] Bastian Leibe,et al. Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.

[24] Ming-Hsuan Yang,et al. SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Ming-Ming Cheng,et al. EGNet: Edge Guidance Network for Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27] Alexander G. Schwing,et al. VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[28] Subhransu Maji,et al. Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[29] Zhe L. Lin,et al. Fast Video Object Segmentation via Dynamic Targeting Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Qingming Huang,et al. Label Decoupling Framework for Salient Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Jun-Sik Kim,et al. Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[33] Thomas Brox,et al. Lucid Data Dreaming for Video Object Segmentation , 2017, International Journal of Computer Vision.

[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[35] Alexander Sorkine-Hornung,et al. Bilateral Space Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Sanja Fidler,et al. DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Euntai Kim,et al. Video Multitask Transformer Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[39] Ming-Hsuan Yang,et al. Fast and Accurate Online Video Object Segmentation via Tracking Parts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Peter V. Gehler,et al. Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Mubarak Shah,et al. CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] Xiang Bai,et al. Asymmetric Non-Local Neural Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43] K.-K. Maninis,et al. Video Object Segmentation without Temporal Information , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Xiaojuan Qi,et al. AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] Kalyan Sunkavalli,et al. Fast Video Object Segmentation by Reference-Guided Mask Propagation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Dustin Tran,et al. Image Transformer , 2018, ICML.

[47] Michael J. Black,et al. Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Luc Van Gool,et al. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[50] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Bastian Leibe,et al. PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation , 2018, ACCV.

[52] Ling Shao,et al. RANet: Ranking Attention Network for Fast Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53] Luc Van Gool,et al. One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Michael Felsberg,et al. A Generative Appearance Model for End-To-End Video Object Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).