Accurate Bounding-box Regression with Distance-IoU Loss for Visual Tracking

Most existing tracking methods are based on using a classifier and multi-scale estimation to estimate the state of the target. Consequently, and as expected, trackers have become more stable while tracking accuracy has stagnated. While the ATOM tracker adopts a maximum overlap method based on an intersection-over-union (IoU) loss to mitigate this problem, there are defects in the IoU loss itself, that make it impossible to continue to optimize the objective function when a given bounding box is completely contained within another bounding box; this makes it very challenging to accurately estimate the target state. Accordingly, in this paper, we address the above-mentioned problem by proposing a novel tracking method based on a distance-IoU (DIoU) loss, such that the proposed tracker consists of a target estimation component and a target classification component. The target estimation component is trained to predict the DIoU score between the target ground-truth bounding-box and the estimated bounding-box. The DIoU loss can maintain the advantage provided by the IoU loss while minimizing the distance between the center points of two bounding boxes, thereby making the target estimation more accurate. Moreover, we introduce a classification component that is trained online and optimized with a Conjugate-Gradient-based strategy to guarantee real-time tracking speed. Comprehensive experimental results demonstrate that our DIoUTrack achieves competitive tracking accuracy when compared with state-of-the-art trackers while with a real-time tracking speed.

[1]  Peng Lu,et al.  Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Andrew Blake,et al.  Sparse Bayesian learning for efficient visual tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[9]  Gongjian Wen,et al.  Masked and dynamic Siamese network for robust visual tracking , 2019, Inf. Sci..

[10]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[11]  Fahad Shahbaz Khan,et al.  Learning the Model Update for Siamese Trackers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Josef Kittler,et al.  Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Changsheng Xu,et al.  Multi-task Correlation Particle Filter for Robust Object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Antoni B. Chan,et al.  Learning Dynamic Memory Networks for Object Tracking , 2018, ECCV.

[18]  Luc Van Gool,et al.  Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Alexander C. Berg,et al.  Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers , 2018, ECCV.

[20]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[21]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Haibin Ling,et al.  Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[25]  Zhenyu He,et al.  Target-Aware Deep Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[27]  Feng Li,et al.  Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Zhiyong Li,et al.  Reliable correlation tracking via dual-memory selection model , 2020, Inf. Sci..

[29]  Rongrong Ji,et al.  Robust infrared target tracking based on particle filter with embedded saliency detection , 2015, Inf. Sci..

[30]  Shengping Zhang,et al.  Robust Visual Tracking Using an Effective Appearance Model Based on Sparse Coding , 2012, TIST.

[31]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[34]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Huchuan Lu,et al.  Visual Tracking via Adaptive Spatially-Regularized Correlation Filters , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[38]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Haibin Ling,et al.  Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yang Li,et al.  YES NO Cartesian Update Update Feature Extraction Feature Extraction Phase Correlation Resample Min Eq . 3 ? Fourier spaceLog-Polar Cross Correlation Model Fourier space Model Sample Sample , 2018 .

[41]  Tianzhu Zhang,et al.  Graph Convolutional Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Huchuan Lu,et al.  Real-Time 'Actor-Critic' Tracking , 2018, ECCV.

[43]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Huchuan Lu,et al.  GradNet: Gradient-Guided Network for Visual Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Zhenyu He,et al.  Particle filter re-detection for visual tracking via correlation filters , 2018, Multimedia Tools and Applications.

[46]  Bohyung Han,et al.  Real-Time MDNet , 2018, ECCV.

[47]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Yiannis Demiris,et al.  Context-Aware Deep Feature Compression for High-Speed Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Wei Liu,et al.  Unsupervised Deep Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Huchuan Lu,et al.  Structured Siamese Network for Real-Time Visual Tracking , 2018, ECCV.

[51]  Honggang Zhang,et al.  Deep Attentive Tracking via Reciprocative Learning , 2018, NeurIPS.

[52]  Peng Gao,et al.  Learning Reinforced Attentional Representation for End-to-End Visual Tracking , 2019, Inf. Sci..

[53]  Zhaohui Zheng,et al.  Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[54]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[55]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Branko Ristic,et al.  Spatio-temporal tracking from natural language statements using outer probability theory , 2018, Inf. Sci..

[57]  Zhenyu He,et al.  Robust Object Tracking via Key Patch Sparse Representation , 2017, IEEE Transactions on Cybernetics.

[58]  Jianbing Shen,et al.  Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[59]  Liang Xiao,et al.  Augmenting cascaded correlation filters with spatial-temporal saliency for visual tracking , 2019, Inf. Sci..

[60]  Luc Van Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Zhiwei Xiong,et al.  SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..