论文信息 - The Ninth Visual Object Tracking VOT2021 Challenge Results

The Ninth Visual Object Tracking VOT2021 Challenge Results

The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by the VOT initiative. Results of 71 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2021 challenge was composed of four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 challenge focused on short-term tracking in RGB, (ii) VOT-RT2021 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2021 focused on long-term tracking, namely coping with target disappearance and reappearance and (iv) VOT-RGBD2021 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2021 dataset was refreshed, while VOT-RGBD2021 introduces a training dataset and sequestered dataset for winner identification. The source code for most of the trackers, the datasets, the evaluation kit and the results along with the source code for most trackers are publicly available at the challenge website1.

[1] Liming Zhang,et al. A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[2] Ross B. Girshick,et al. Mask R-CNN , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Philip H. S. Torr,et al. Siam R-CNN: Visual Tracking by Re-Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Yutao Cui,et al. Target Transformed Regression for Accurate Tracking , 2021, ArXiv.

[5] Quanfu Fan,et al. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6] Chi-Yi Tsai,et al. Reptile Meta-Tracking , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[7] Philip H. S. Torr,et al. The Eighth Visual Object Tracking VOT2020 Challenge Results , 2020, ECCV Workshops.

[8] Zhe,et al. The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9] Min Young Kim,et al. Automatic segmentation of mitochondria and endolysosomes in volumetric electron microscopy data , 2020, Comput. Biol. Medicine.

[10] Wenzhong Guo,et al. Template Enhancement and Mask Generation for Siamese Tracking , 2021, IEEE Signal Processing Letters.

[11] Stephen Lin,et al. RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Kai Chen,et al. Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Bin Yan,et al. Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] L. Gool,et al. Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17] Bohyung Han,et al. Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Michael Felsberg,et al. The Visual Object Tracking VOT2017 Challenge Results , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[19] Luka Cehovin. TraX: The visual Tracking eXchange protocol and library , 2017, Neurocomputing.

[20] Martin Danelljan,et al. Energy-Based Models for Deep Probabilistic Regression , 2020, ECCV.

[21] DeepMix: Online Auto Data Augmentation for Robust Visual Object Tracking , 2021, ICME.

[22] Jiri Matas,et al. D3S – A Discriminative Single Shot Segmentation Tracker , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Michael Felsberg,et al. The Visual Object Tracking VOT2013 Challenge Results , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[24] Jason Yosinski,et al. An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[25] Xin Zhao,et al. GlobalTrack: A Simple and Strong Baseline for Long-term Tracking , 2019, AAAI.

[26] Rui Caseiro,et al. High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Ales Leonardis,et al. Robust visual tracking using template anchors , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28] Luc Van Gool,et al. Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30] Jianlong Fu,et al. Learning Spatio-Temporal Transformer for Visual Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] Ho Kei Cheng,et al. Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Zhenyu He,et al. The Seventh Visual Object Tracking VOT2019 Challenge Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[33] Felix Järemo Lawin,et al. Learning What to Learn for Video Object Segmentation , 2020, ECCV.

[34] Huchuan Lu,et al. Transformer Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Ning Xu,et al. Video Object Segmentation Using Space-Time Memory Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36] Shiming Ge,et al. Accurate UAV Tracking with Distance-Injected Overlap Maximization , 2020, ACM Multimedia.

[37] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[38] Zhenyu He,et al. The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[39] Xin Pan,et al. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Michael Felsberg,et al. ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.

[43] Dong Wang,et al. High-Performance Long-Term Tracking With Meta-Updater , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Luc Van Gool,et al. Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] Radu Timofte,et al. How to Train Your Energy-Based Model for Regression , 2020, BMVC.

[46] Jiri Matas,et al. Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[47] Discriminative and Robust Online Learning for Siamese Visual Tracking , 2019, AAAI.

[48] Niki Martinel,et al. Tracking-by-Trackers with a Distilled and Reinforced Model , 2020, ACCV.

[49] Gang Yu,et al. SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[50] Euntai Kim,et al. Kernelized Memory Network for Video Object Segmentation , 2020, ECCV.

[51] Yi Wu,et al. Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Jiri Matas,et al. Performance Evaluation Methodology for Long-Term Single-Object Tracking , 2020, IEEE Transactions on Cybernetics.

[53] Xin Zhao,et al. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[55] Shengping Zhang,et al. Siamese Box Adaptive Network for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Michael Felsberg,et al. Discriminative Scale Space Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[58] Wei Wu,et al. SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] L. Gool,et al. Know Your Surroundings: Exploiting Scene Information for Object Tracking , 2020, ECCV.

[60] Qiang Wang,et al. Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Michael Felsberg,et al. ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Ales Leonardis,et al. Robust Visual Tracking Using an Adaptive Coupled-Layer Visual Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[64] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[65] Jiri Matas,et al. CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66] Felix Järemo Lawin,et al. Learning Fast and Robust Target Models for Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67] Wei Lu,et al. RPT: Learning Point Set Representation for Siamese Visual Tracking , 2020, ECCV Workshops.

[68] Luc Van Gool,et al. Learning Target Candidate Association to Keep Track of What Not to Track , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).