3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW
暂无分享,去创建一个
Huchuan Lu | Zhe Chen | Xiaoqi Zhao | Lihe Zhang | Jiawen Zhu | Lu Zhang | Zeqi Hao | Shijie Chang | Ben Kang
[1] Ross B. Girshick,et al. Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Jifeng Dai,et al. BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Hongsheng Li,et al. Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Jifeng Dai,et al. Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Hongsheng Li,et al. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Xiaogang Wang,et al. Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs , 2022, NeurIPS.
[7] Yunchao Wei,et al. Large-scale Video Panoptic Segmentation in the Wild: A Benchmark , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jifeng Dai,et al. Vision Transformer Adapter for Dense Predictions , 2022, ICLR.
[9] Jifeng Dai,et al. BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers , 2022, ECCV.
[10] Trevor Darrell,et al. A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] A. Schwing,et al. Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Xizhou Zhu,et al. Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Jiaxu Miao,et al. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Chi-Keung Tang,et al. CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.
[16] Vittorio Ferrari,et al. COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[18] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[20] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[21] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).