Benchmarking Deep Trackers on Aerial Videos

In recent years, deep learning-based visual object trackers have achieved state-of-the-art performance on several visual object tracking benchmarks. However, most tracking benchmarks are focused on ground level videos, whereas aerial tracking presents a new set of challenges. In this paper, we compare ten trackers based on deep learning techniques on four aerial datasets. We choose top performing trackers utilizing different approaches, specifically tracking by detection, discriminative correlation filters, Siamese networks and reinforcement learning. In our experiments, we use a subset of OTB2015 dataset with aerial style videos; the UAV123 dataset without synthetic sequences; the UAV20L dataset, which contains 20 long sequences; and DTB70 dataset as our benchmark datasets. We compare the advantages and disadvantages of different trackers in different tracking situations encountered in aerial data. Our findings indicate that the trackers perform significantly worse in aerial datasets compared to standard ground level videos. We attribute this effect to smaller target size, camera motion, significant camera rotation with respect to the target, out of view movement, and clutter in the form of occlusions or similar looking distractors near tracked object.

[1]  Qi Tian,et al.  The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking , 2018, ECCV.

[2]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ales Leonardis,et al.  Beyond Standard Benchmarks: Parameterizing Performance Evaluation in Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  A. Aydın Alatan,et al.  Good Features to Correlate for Visual Tracking , 2017, IEEE Transactions on Image Processing.

[6]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Takeo Kanade,et al.  Visual tracking of a moving target by a camera mounted on a robot: a combination of control and vision , 1993, IEEE Trans. Robotics Autom..

[8]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[9]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Soon Ki Jung,et al.  Handcrafted and Deep Trackers: A Review of Recent Object Tracking Approaches , 2018, ArXiv.

[11]  Ming Tang,et al.  High-Speed Tracking with Multi-kernel Correlation Filters , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Faliang Chang,et al.  Visual tracking with multifeature joint sparse representation , 2015, J. Electronic Imaging.

[16]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[17]  Dit-Yan Yeung,et al.  Visual Object Tracking for Unmanned Aerial Vehicles: A Benchmark and New Motion Models , 2017, AAAI.

[18]  Haibin Ling,et al.  SANet: Structure-Aware Network for Visual Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Qi Tian,et al.  Multi-cue Correlation Filters for Robust Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Erik Blasch,et al.  Encoding color information for visual tracking: Algorithms and benchmark , 2015, IEEE Transactions on Image Processing.

[22]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[23]  Breton Minnehan,et al.  Benchmarking deep learning trackers on aerial videos , 2018, Defense + Security.

[24]  Shiguang Shan,et al.  Joint Representation and Truncated Inference Learning for Correlation Filter based Tracking , 2018, ECCV.

[25]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[26]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[27]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[28]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[29]  Yuan Dong,et al.  Multi-Hierarchical Independent Correlation Filters For Visual Tracking , 2018, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[30]  Yuan Dong,et al.  Correlation Filters with Weighted Convolution Responses , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[31]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Alexander C. Berg,et al.  Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers , 2018, ECCV.

[33]  Changsheng Xu,et al.  Multi-task Correlation Particle Filter for Robust Object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ling Shao,et al.  Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[36]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Kuk-Jin Yoon,et al.  Visual Tracking via Adaptive Tracker Selection with Multiple Features , 2012, ECCV.

[38]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[40]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  David Zhang,et al.  Integrating Boundary and Center Correlation Filters for Visual Tracking with Aspect Ratio Variation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[42]  Namrata Vaswani,et al.  Particle Filter With a Mode Tracker for Visual Tracking Across Illumination Changes , 2012, IEEE Transactions on Image Processing.

[43]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[44]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[46]  Sebastian Bodenstedt,et al.  Visual tracking of da Vinci instruments for laparoscopic surgery , 2014, Medical Imaging.

[47]  Bernard Ghanem,et al.  Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Deva Ramanan,et al.  Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Huchuan Lu,et al.  Real-Time 'Actor-Critic' Tracking , 2018, ECCV.

[50]  Wei Wu,et al.  End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Huchuan Lu,et al.  Structured Siamese Network for Real-Time Visual Tracking , 2018, ECCV.

[53]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Honggang Zhang,et al.  Deep Attentive Tracking via Reciprocative Learning , 2018, NeurIPS.

[55]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[56]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[57]  Jiri Matas,et al.  Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Zhenyu He,et al.  Target-Aware Deep Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Vitaly Kober,et al.  Objects tracking with adaptive correlation filters and Kalman filtering , 2015, SPIE Optical Engineering + Applications.

[62]  Bernard Ghanem,et al.  Target Response Adaptation for Correlation Filter Tracking , 2016, ECCV.

[63]  Simon Lucey,et al.  Learning Policies for Adaptive Tracking with Deep Feature Cascades , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[64]  Lorenzo Torresani,et al.  Detect-and-Track: Efficient Pose Estimation in Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Huchuan Lu,et al.  Robust object tracking via sparsity-based collaborative model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[67]  Haibin Ling,et al.  Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Huiyu Zhou,et al.  Object tracking using SIFT features and mean shift , 2009, Comput. Vis. Image Underst..

[69]  David Zhang,et al.  Robust Object Tracking Using Joint Color-Texture Histogram , 2009, Int. J. Pattern Recognit. Artif. Intell..

[70]  Bohyung Han,et al.  Real-Time MDNet , 2018, ECCV.

[71]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..

[73]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  J. Christian Gerdes,et al.  Path-tracking for autonomous vehicles at the limit of friction , 2017, 2017 American Control Conference (ACC).

[75]  Jianbing Shen,et al.  Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[76]  Zhiwei Xiong,et al.  SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  Qiang Wang,et al.  Visual Tracking via Spatially Aligned Correlation Filters Network , 2018, ECCV.

[79]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[80]  Arnold W. M. Smeulders,et al.  Long-term Tracking in the Wild: A Benchmark , 2018, ECCV.

[81]  Jiwen Lu,et al.  Deep Reinforcement Learning with Iterative Shift for Visual Tracking , 2018, ECCV.

[82]  Baochang Zhang,et al.  Visual object tracking via sample-based Adaptive Sparse Representation (AdaSR) , 2011, Pattern Recognit..

[83]  Qiang Wang,et al.  Robust Object Tracking Based on Temporal and Spatial Deep Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[84]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[85]  Simon Lucey,et al.  Need for Speed: A Benchmark for Higher Frame Rate Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[86]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Bohyung Han,et al.  Modeling and Propagating CNNs in a Tree Structure for Visual Tracking , 2016, ArXiv.

[88]  Yiannis Demiris,et al.  Attentional Correlation Filter Network for Adaptive Visual Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[92]  Feng Li,et al.  Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[93]  Breton Minnehan,et al.  Fully convolutional adaptive tracker with real time performance , 2019, Defense + Commercial Sensing.

[94]  Jenq-Neng Hwang,et al.  On-Road Pedestrian Tracking Across Multiple Driving Recorders , 2015, IEEE Transactions on Multimedia.

[95]  Michael Felsberg,et al.  The Visual Object Tracking VOT2017 Challenge Results , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[96]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[97]  Shuicheng Yan,et al.  NUS-PRO: A New Visual Tracking Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[98]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.