VisDrone-SOT2018: The Vision Meets Drone Single-Object Tracking Challenge Results

Single-object tracking, also known as visual tracking, on the drone platform attracts much attention recently with various applications in computer vision, such as filming and surveillance. However, the lack of commonly accepted annotated datasets and standard evaluation platform prevent the developments of algorithms. To address this issue, the Vision Meets Drone Single-Object Tracking (VisDrone-SOT2018) Challenge workshop was organized in conjunction with the 15th European Conference on Computer Vision (ECCV 2018) to track and advance the technologies in such field. Specifically, we collect a dataset, including 132 video sequences divided into three non-overlapping sets, i.e., training (86 sequences with 69, 941 frames), validation (11 sequences with 7, 046 frames), and testing (35 sequences with 29, 367 frames) sets. We provide fully annotated bounding boxes of the targets as well as several useful attributes, e.g., occlusion, background clutter, and camera motion. The tracking targets in these sequences include pedestrians, cars, buses, and animals. The dataset is extremely challenging due to various factors, such as occlusion, large scale, pose variation, and fast motion. We present the evaluation protocol of the VisDrone-SOT2018 challenge and the results of a comparison of 22 trackers on the benchmark dataset, which are publicly available on the challenge website: http://www.aiskyeye.com/. We hope this challenge largely boosts the research and development in single object tracking on drone platforms.

Wei Zhang | Yiannis Kompatsiaris | Martin Lauer | Qinghua Hu | Yong Wang | Qingming Huang | Jie Zhang | Xin Zhang | Yifan Zhang | Jing Li | Wei Tian | Yifan Yang | Lu Ding | Haojie Li | Hao Liu | Yang Meng | Jungong Han | Jin Young Choi | Haibin Ling | Pengfei Zhu | Jian Cheng | Robert Laganière | Baochang Zhang | Xiaoyu Liu | Xiaotong Li | Weiming Hu | Jongwon Choi | Stefanos Vrochidis | Dongdong Li | Qianqian Xu | Chunlei Liu | Qingshan Liu | Longyin Wen | Qiang Wang | Dawei Du | Sangdoo Yun | Byeongho Heo | Wenhao Wang | Haotian Wu | Yuankai Qi | Weidong Chen | Wenrui Ding | Ke Song | Hao Cheng | Yanyun Zhao | Xiao Bian | Xinbin Luo | Shengyin Zhu | Jinyu Yang | Emmanouil Michail | Konstantinos Avgerinakis | Xiaohao He | Hanlin Chen | Chenfeng Liu | Juanping Zhao | Yangliu Kuai | Zhipeng Deng | Peizhen Zhang | Asanka G. Perera | Zhiqun He | Lianjie Wang | Jiaqing Fan | Qinqin Nie | Panagiotis Giannakeris | Wenhua Zhang | Kaihua Zhang | Sihang Wu | Kyuewang Lee | Wenya Ma | Xixi Hu | Kaiwen Duan | Ruixin Zhang | Yaxuan Li | Wei Zhang | Longyin Wen | Haibin Ling | R. Laganière | Martin Lauer | Qingshan Liu | Haojie Li | Q. Hu | J. Han | Kaihua Zhang | Y. Kompatsiaris | Dawei Du | J. Choi | Jongwon Choi | Qingming Huang | Yuankai Qi | Xiao Bian | Zhiqun He | Yifan Zhang | Xin Zhang | Qianqian Xu | Peizhen Zhang | Qinqin Nie | Hao Cheng | Chenfeng Liu | Xiaoyu Liu | Wenya Ma | Haotian Wu | Lianjie Wang | Dongdong Li | E. Michail | Hao Liu | Jian Cheng | Juanping Zhao | Kaiwen Duan | Konstantinos Avgerinakis | Lu Ding | Panagiotis Giannakeris | S. Vrochidis | Yangliu Kuai | Yong Wang | Zhipeng Deng | Baochang Zhang | Sangdoo Yun | Wenrui Ding | Jinyu Yang | Kyuewang Lee | Byeongho Heo | Wei Tian | Yanyun Zhao | Xixi Hu | Hanlin Chen | Jie-Hao Zhang | Yaxuan Li | Wenhao Wang | Xin Luo | Qiang Wang | Ruixin Zhang | Shengyin Zhu | Peng Fei Zhu | Jiaqing Fan | Xiaotong Li | Wen-Hai Zhang | Yifan Yang | Weidong Chen | Chunlei Liu | Ke Song | Yang Meng | Sihang Wu | Jing Li | Weiming Hu | Xiaohao He | M. Lauer | Weiming Hu

[1]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[4]  Huchuan Lu,et al.  Learning regression and verification networks for long-term visual tracking , 2018, ArXiv.

[5]  Qiang Wang,et al.  DCFNet: Discriminant Correlation Filters Network for Visual Tracking , 2017, ArXiv.

[6]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yiannis Demiris,et al.  Context-Aware Deep Feature Compression for High-Speed Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[10]  Haibin Ling,et al.  Robust visual tracking using ℓ1 minimization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Luc Van Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[15]  Simon Lucey,et al.  Need for Speed: A Benchmark for Higher Frame Rate Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Yong Wang,et al.  VisDrone-SOT2019: The Vision Meets Drone Single Object Tracking Challenge Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[17]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Feng Li,et al.  Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Michael Felsberg,et al.  The Visual Object Tracking VOT2017 Challenge Results , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[20]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Stan Z. Li,et al.  Online Spatio-temporal Structural Context Learning for Visual Tracking , 2012, ECCV.

[22]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[24]  Steven S. Beauchemin,et al.  The computation of optical flow , 1995, CSUR.

[25]  Xiaoqin Zhang,et al.  Incremental Tensor Subspace Learning and Its Applications to Foreground Segmentation and Tracking , 2011, International Journal of Computer Vision.

[26]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[27]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Bernard Ghanem,et al.  Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[30]  Yifan Wu,et al.  Planar Object Tracking in the Wild: A Benchmark , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Jiri Matas,et al.  Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[32]  Qi Tian,et al.  The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking , 2018, ECCV.

[33]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Qi Tian,et al.  Multi-cue Correlation Filters for Robust Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[38]  Erik Blasch,et al.  Encoding color information for visual tracking: Algorithms and benchmark , 2015, IEEE Transactions on Image Processing.

[39]  Huchuan Lu,et al.  Robust Object Tracking via Sparse Collaborative Appearance Model , 2014, IEEE Transactions on Image Processing.

[40]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Qingming Huang,et al.  Structure-Aware Local Sparse Coding for Visual Tracking , 2018, IEEE Transactions on Image Processing.

[43]  Winston H. Hsu,et al.  Drone-Based Object Counting by Spatially Regularized Regional Proposal Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[45]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[46]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[47]  Shuicheng Yan,et al.  NUS-PRO: A New Visual Tracking Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[49]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[51]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[52]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[56]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Haibin Ling,et al.  Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[60]  Chunhui Zhang,et al.  Robust Deep Tracking with Two-step Augmentation Discriminative Correlation Filters , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[61]  Josef Kittler,et al.  Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking , 2018, IEEE Transactions on Image Processing.

[62]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[63]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Yang Lu,et al.  Online Object Tracking, Learning and Parsing with And-Or Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[67]  Zhenyu He,et al.  The Thermal Infrared Visual Object Tracking VOT-TIR2016 Challenge Results , 2016, ECCV Workshops.

[68]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[69]  Ling Shao,et al.  Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[71]  Dong Yi,et al.  Robust Online Learned Spatio-Temporal Context Model for Visual Tracking , 2014, IEEE Transactions on Image Processing.

[72]  Qi Tian,et al.  Geometric Hypergraph Learning for Visual Tracking , 2016, IEEE Transactions on Cybernetics.

[73]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[75]  Yiannis Demiris,et al.  Visual Tracking Using Attention-Modulated Disintegration and Integration , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Siwei Lyu,et al.  Hybrid structure hypergraph for online deformable object tracking , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[77]  Nuno Vasconcelos,et al.  Robust Deformable and Occluded Object Tracking With Dynamic Graph , 2014, IEEE Transactions on Image Processing.

[78]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[79]  Qi Tian,et al.  Iterative Graph Seeking for Object Tracking , 2018, IEEE Transactions on Image Processing.

[80]  Wei Zhang,et al.  VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[81]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[84]  Yong Liu,et al.  Large Margin Object Tracking with Circulant Feature Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Yuan Dong,et al.  Correlation Filters with Weighted Convolution Responses , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[86]  Kai Chen,et al.  Region Proposal by Guided Anchoring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Qingming Huang,et al.  Online Deformable Object Tracking Based on Structure-Aware Hyper-Graph , 2016, IEEE Transactions on Image Processing.

[88]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Michael Felsberg,et al.  Discriminative Scale Space Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[91]  Junzhou Huang,et al.  Robust Visual Tracking Using Local Sparse Appearance Model and K-Selection , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[93]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[94]  Yong Wang,et al.  Hard negative mining for correlation filters in visual tracking , 2019, Machine Vision and Applications.

[95]  Ronggang Wang,et al.  A New Low-Light Image Enhancement Algorithm Using Camera Response Model , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[96]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[97]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[98]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).