VTT: Long-term Visual Tracking with Transformers

[1]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[3]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Dit-Yan Yeung,et al.  Visual Object Tracking for Unmanned Aerial Vehicles: A Benchmark and New Motion Models , 2017, AAAI.

[5]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Vineet Gandhi,et al.  Long-Term Visual Object Tracking Benchmark , 2017, ACCV.

[7]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Huchuan Lu,et al.  Learning regression and verification networks for long-term visual tracking , 2018, ArXiv.

[9]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Arnold W. M. Smeulders,et al.  Long-term Tracking in the Wild: A Benchmark , 2018, ECCV.

[12]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xin Zhao,et al.  GlobalTrack: A Simple and Strong Baseline for Long-term Tracking , 2019, AAAI.

[14]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[17]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[19]  Huchuan Lu,et al.  ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-Term Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[23]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[27]  A. Hampapur,et al.  Smart video surveillance: exploring the concept of multiscale spatiotemporal tracking , 2005, IEEE Signal Processing Magazine.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Jason Jianjun Gu,et al.  Self-paced model learning for robust visual tracking , 2017, J. Electronic Imaging.

[31]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[35]  Hongdong Li,et al.  Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hermann Ney,et al.  RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.

[37]  Henry Schneiderman,et al.  Real-time Visual Processing For Autonomous Driving , 1993, Proceedings of the Intelligent Vehicles '93 Symposium.

[38]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.