论文信息 - UVO Challenge on Video-based Open-World Segmentation 2021: 1st Place Solution

UVO Challenge on Video-based Open-World Segmentation 2021: 1st Place Solution

In this report, we introduce our (pretty straightforard) two-step “detect-then-match” video instance segmentation method. The first step performs instance segmentation for each frame to get a large number of instance mask proposals. The second step is to do inter-frame instance mask matching with the help of optical flow. We demonstrate that with high quality mask proposals, a simple matching mechanism is good enough for tracking. Our approach achieves the first place in the UVO 2021 Video-based Open-World Segmentation Challenge.

Vincent Lepetit | Yang Xiao | Yuming Du | Wen Guo

[1] Silvio Savarese,et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4] Du Tran,et al. Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5] Zeming Li,et al. YOLOX: Exceeding YOLO Series in 2021 , 2021, ArXiv.

[6] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[8] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[10] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[12] Chang D. Yoo,et al. Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution , 2019, NeurIPS.

[13] Jia Deng,et al. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[14] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[16] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17] Kai Chen,et al. CARAFE: Content-Aware ReAssembly of FEatures , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18] Vincent Lepetit,et al. 1st Place Solution for the UVO Challenge on Image-based Open-World Segmentation 2021 , 2021, ArXiv.

[19] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[20] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.