The Ninth Visual Object Tracking VOT2021 Challenge Results

The Visual Object Tracking challenge VOT2021 is the ninth annual tracker benchmarking activity organized by the VOT initiative. Results of 71 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2021 challenge was composed of four sub-challenges focusing on different tracking domains: (i) VOT-ST2021 challenge focused on short-term tracking in RGB, (ii) VOT-RT2021 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2021 focused on long-term tracking, namely coping with target disappearance and reappearance and (iv) VOT-RGBD2021 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2021 dataset was refreshed, while VOT-RGBD2021 introduces a training dataset and sequestered dataset for winner identification. The source code for most of the trackers, the datasets, the evaluation kit and the results along with the source code for most trackers are publicly available at the challenge website1.

Yong Wang | Ling Shao | Michael Felsberg | Luc Van Gool | Radu Timofte | Josef Kittler | Chunhui Zhang | Shiming Ge | Xiao-Jun Wu | Mohamed H. Abdelpakey | Joni-Kristian Kämäräinen | Niki Martinel | Christian Micheloni | Martin Danelljan | Gustav Häger | Mohamed Shehata | Hyung Jin Chang | Qili Deng | Jun Yin | Bineng Zhong | Philip H. S. Torr | Matej Kristan | Filiz Gurkan | Bilge Gunsel | Ondrej Drbohlav | Qingjie Liu | Aleš Leonardis | Paul Voigtlaender | Xingping Dong | Yutao Cui | Miao Cheng | Houwen Peng | Zhen-Hua Feng | Felix Järemo Lawin | Philip Torr | Rongrong Ji | Luka Cehovin Zajc | Andreas Robinson | Goutam Bhat | Bedirhan Uzun | Hasan Saribas | Kangkai Zhang | Byeong Hak Kim | Gustavo Fernández | Alireza Memarmoghadam | Jonathon Luiten | Ozgun Cirakman | Linyuan Wang | Llukman Cerkezi | Roman Pflugfelder | Xiangyuan Lan | Hakan Cevikalp | Jani Käpylä | Ziang Ma | Shang-Jhih Jhang | Alan Lukežič | Kenan Dai | Mohana Murali Dasari | Fahad Shahbaz Khan | Rama Krishna Gorthi | Danda Paudel | Yunhong Wang | Bo Liu | Dong Wang | Xiaohan Zhang | Bin Yan | Hui Li | Tianyang Xu | Fei Xie | Huchuan Lu | Qing Guo | Felix Juefei-Xu | Xiaoning Song | Wei Lu | Xiang Xu | Wankou Yang | Haitao Zhang | Matteo Dunnhofer | Chengwei Zhang | Gangshan Wu | Zhiyong Feng | Cheng Jiang | Yuzhen Niu | Xiaolin Zhang | Lijun Wang | Ziyi Cheng | Wanli Xue | Shengyong Chen | Xianxian Li | Guangting Wang | Yuezhou Li | Yu Ye | Chi-Yi Tsai | Jianbing Shen | Jingen Liu | Xinyu Zhang | Bastian Leibe | Jiawen Zhu | Limin Wang | Kaihua Zhang | Song Yan | Jinyu Yang | Xin Chen | Zhangyong Tang | Wencheng Han | Xue-Feng Zhu | Shoumeng Qiu | Yuzhang Gu | Christoph Mayer | Jiřı́ Matas | Luka Čehovin Zajc | Zhongqun Zhang | Mohamed Abdelpakey | Yu-Chen Chiu | Daniel K. Du | Zhihong Fu | Yanyan Huang | Yingjie Jiang | Yin Jun | Xiao Ke | Jun Ha Lee | Jianhua Li | Chang Liu | Li Liu | Jie Ma | Aravindh Rajiv | Muhammad Rana | Furao Shen | Kristian Simonato | Liangliang Wang | Chenyan Wu | Xiaoyun Yang | Zhibin Zhang | Shaochuan Zhao | Ming Zhen | A. Leonardis | L. Gool | M. Felsberg | Martin Danelljan | Jiri Matas | J. Kittler | Limin Wang | F. Khan | L. Shao | Andreas Robinson | Huchuan Lu | Qing Guo | P. Voigtlaender | Jonathon Luiten | R. Timofte | D. Paudel | Jianbing Shen | C. Micheloni | J. Kämäräinen | N. Martinel | Yunhong Wang | Kaihua Zhang | Lijun Wang | O. Drbohlav | Goutam Bhat | Wankou Yang | M. Kristan | R. Pflugfelder | G. Fernandez | Zhenhua Feng | Qingjie Liu | Tianyang Xu | H. Chang | Gustav Häger | A. Lukežič | Alireza Memarmoghadam | X. Lan | Linyuan Wang | Hakan Çevikalp | Gangshan Wu | Xingping Dong | Xiaoning Song | Xiaolin Zhang | Xiaojun Wu | Yong Wang | Yuzhen Niu | Rongrong Ji | Houwen Peng | Z. Fu | Yu-Chen Chiu | Bineng Zhong | Zhiyong Feng | Chi-Yi Tsai | M. Shehata | Jinyu Yang | Bedirhan Uzun | Bin Yan | Dong Wang | Fei Xie | Haitao Zhang | Hasan Saribas | Jianhua Li | Kenan Dai | Miao Cheng | R. K. Gorthi | Shao-Chuan Zhao | Shoumeng Qiu | Xuefeng Zhu | Yingjie Jiang | Yu Ye | Yuezhou Li | Yuzhang Gu | Zhangyong Tang | Ziang Ma | Shiming Ge | J. Yin | Xin Chen | Xinyu Zhang | Xiang Xu | B. Gunsel | Liangliang Wang | Jie Ma | Chang Liu | Xiaoyun Yang | Wencheng Han | Ziyi Cheng | Felix Juefei-Xu | Guangting Wang | Chenyang Wu | Matteo Dunnhofer | Jingen Liu | B. Kim | Jani Käpylä | Song Yan | Zhongqun Zhang | L. Cerkezi | Shengyong Chen | Ozgun Cirakman | Yutao Cui | Qili Deng | Filiz Gurkan | Yanyan Huang | Shang-Jhih Jhang | Cheng Jiang | Yin Jun | Xiaolong Ke | Bastian Leibe | Hui Li | Xianxian Li | Bo Liu | Li Liu | Wei Lu | Christoph Mayer | Aravindh Rajiv | M. Rana | Furao Shen | Kristian Simonato | Wanli Xue | Chen Zhang | Chunhui Zhang | Kangkai Zhang | Xiaohan Zhang | Zhibing Zhang | Mingmin Zhen | Jiawen Zhu | Chengwei Zhang | H. Chang | Tianyang Xu | Ling Shao | L. Č. Zajc | Rama Krishna Sai Subrahmanyam Gorthi | Shaochuan Zhao

[1]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[2]  Ross B. Girshick,et al.  Mask R-CNN , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Philip H. S. Torr,et al.  Siam R-CNN: Visual Tracking by Re-Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yutao Cui,et al.  Target Transformed Regression for Accurate Tracking , 2021, ArXiv.

[5]  Quanfu Fan,et al.  CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Chi-Yi Tsai,et al.  Reptile Meta-Tracking , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[7]  Philip H. S. Torr,et al.  The Eighth Visual Object Tracking VOT2020 Challenge Results , 2020, ECCV Workshops.

[8]  Zhe,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9]  Min Young Kim,et al.  Automatic segmentation of mitochondria and endolysosomes in volumetric electron microscopy data , 2020, Comput. Biol. Medicine.

[10]  Wenzhong Guo,et al.  Template Enhancement and Mask Generation for Siamese Tracking , 2021, IEEE Signal Processing Letters.

[11]  Stephen Lin,et al.  RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bin Yan,et al.  Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  L. Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Michael Felsberg,et al.  The Visual Object Tracking VOT2017 Challenge Results , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[19]  Luka Cehovin TraX: The visual Tracking eXchange protocol and library , 2017, Neurocomputing.

[20]  Martin Danelljan,et al.  Energy-Based Models for Deep Probabilistic Regression , 2020, ECCV.

[21]  DeepMix: Online Auto Data Augmentation for Robust Visual Object Tracking , 2021, ICME.

[22]  Jiri Matas,et al.  D3S – A Discriminative Single Shot Segmentation Tracker , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[24]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[25]  Xin Zhao,et al.  GlobalTrack: A Simple and Strong Baseline for Long-term Tracking , 2019, AAAI.

[26]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ales Leonardis,et al.  Robust visual tracking using template anchors , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Luc Van Gool,et al.  Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30]  Jianlong Fu,et al.  Learning Spatio-Temporal Transformer for Visual Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Ho Kei Cheng,et al.  Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zhenyu He,et al.  The Seventh Visual Object Tracking VOT2019 Challenge Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[33]  Felix Järemo Lawin,et al.  Learning What to Learn for Video Object Segmentation , 2020, ECCV.

[34]  Huchuan Lu,et al.  Transformer Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ning Xu,et al.  Video Object Segmentation Using Space-Time Memory Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Shiming Ge,et al.  Accurate UAV Tracking with Distance-Injected Overlap Maximization , 2020, ACM Multimedia.

[37]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[38]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[39]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[43]  Dong Wang,et al.  High-Performance Long-Term Tracking With Meta-Updater , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Luc Van Gool,et al.  Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Radu Timofte,et al.  How to Train Your Energy-Based Model for Regression , 2020, BMVC.

[46]  Jiri Matas,et al.  Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[47]  Discriminative and Robust Online Learning for Siamese Visual Tracking , 2019, AAAI.

[48]  Niki Martinel,et al.  Tracking-by-Trackers with a Distilled and Reinforced Model , 2020, ACCV.

[49]  Gang Yu,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[50]  Euntai Kim,et al.  Kernelized Memory Network for Video Object Segmentation , 2020, ECCV.

[51]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Jiri Matas,et al.  Performance Evaluation Methodology for Long-Term Single-Object Tracking , 2020, IEEE Transactions on Cybernetics.

[53]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Shengping Zhang,et al.  Siamese Box Adaptive Network for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Michael Felsberg,et al.  Discriminative Scale Space Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[58]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  L. Gool,et al.  Know Your Surroundings: Exploiting Scene Information for Object Tracking , 2020, ECCV.

[60]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Ales Leonardis,et al.  Robust Visual Tracking Using an Adaptive Coupled-Layer Visual Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[64]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Jiri Matas,et al.  CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Felix Järemo Lawin,et al.  Learning Fast and Robust Target Models for Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Wei Lu,et al.  RPT: Learning Point Set Representation for Siamese Visual Tracking , 2020, ECCV Workshops.

[68]  Luc Van Gool,et al.  Learning Target Candidate Association to Keep Track of What Not to Track , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).