论文信息 - Progressive Unsupervised Learning for Visual Object Tracking

Progressive Unsupervised Learning for Visual Object Tracking

In this paper, we propose a progressive unsupervised learning (PUL) framework, which entirely removes the need for annotated training videos in visual tracking. Specifically, we first learn a background discrimination (BD) model that effectively distinguishes an object from back-ground in a contrastive learning way. We then employ the BD model to progressively mine temporal corresponding patches (i.e., patches connected by a track) in sequential frames. As the BD model is imperfect and thus the mined patch pairs are noisy, we propose a noise-robust loss function to more effectively learn temporal correspondences from this noisy data. We use the proposed noise robust loss to train backbone networks of Siamese trackers. Without online fine-tuning or adaptation, our unsupervised real-time Siamese trackers can outperform state-of-the-art unsupervised deep trackers and achieve competitive results to the supervised baselines.

Antoni B. Chan | Jia Wan | Wu | Jia Wan

[1] Wojciech Matusik,et al. Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Shawn D. Newsam,et al. Improving Semantic Segmentation via Video Propagation and Label Relaxation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Simon Lucey,et al. Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .

[5] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[6] Aritra Ghosh,et al. Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[7] Bohyung Han,et al. Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[9] Arnold W. M. Smeulders,et al. UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[10] Weilong Yang,et al. Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels , 2019, ICML.

[11] Xin Pan,et al. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Rongrong Ji,et al. Noise-Aware Fully Webly Supervised Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jianbing Shen,et al. Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[15] Fahad Shahbaz Khan,et al. Learning the Model Update for Siamese Trackers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Wei Wu,et al. High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Bernard Ghanem,et al. A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[18] Xueting Li,et al. Joint-task Self-supervised Learning for Temporal Correspondence , 2019, NeurIPS.

[19] James Bailey,et al. Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21] Thomas Brox,et al. Robust Learning Under Label Noise With Iterative Noise-Filtering , 2019, ArXiv.

[22] Houqiang Li,et al. Unsupervised Deep Representation Learning for Real-Time Tracking , 2020, International Journal of Computer Vision.

[23] Weihao Yuan,et al. Self-supervised Object Tracking with Cycle-consistent Siamese Networks , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24] Zhipeng Zhang,et al. Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Mohammad Norouzi,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[26] Samia Ainouz,et al. Temporal Contrastive Pretraining for Video Action Recognition , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27] Zhenyu He,et al. The Seventh Visual Object Tracking VOT2019 Challenge Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[28] Hong-Han Shuai,et al. S2SiamFC: Self-supervised Fully Convolutional Siamese Network for Visual Tracking , 2020, ACM Multimedia.

[29] Michael Felsberg,et al. ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Wei Wu,et al. SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Zuoxin Li,et al. SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2020, AAAI.

[32] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[33] Abhinav Gupta,et al. Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Sanja Fidler,et al. Devil Is in the Edges: Learning Semantic Boundaries From Noisy Annotations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Andrea Vedaldi,et al. Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels , 2019, NeurIPS.

[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37] Fanman Meng,et al. Learning with Noisy Class Labels for Instance Segmentation , 2020, ECCV.

[38] L. Gool,et al. Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.

[40] Michael Felsberg,et al. ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Wei Wu,et al. Adaptive Dilated Network With Self-Correction Supervision for Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Luc Van Gool,et al. Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Xiaogang Wang,et al. Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Michael Felsberg,et al. The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[45] Xiaoou Tang,et al. Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46] Fan Yang,et al. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Xin Zhao,et al. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48] Bernard Ghanem,et al. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[49] Michael Felsberg,et al. Unveiling the Power of Deep Tracking , 2018, ECCV.

[50] Qiang Wang,et al. DCFNet: Discriminant Correlation Filters Network for Visual Tracking , 2017, ArXiv.

[51] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[52] Zhenyu He,et al. The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[53] Larry S. Davis,et al. Learning From Noisy Anchors for One-Stage Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Antoni B. Chan,et al. Learning Dynamic Memory Networks for Object Tracking , 2018, ECCV.

[55] Pengfei Xu,et al. ROAM: Recurrently Optimizing Tracking Model , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Ming-Hsuan Yang,et al. Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Mohan S. Kankanhalli,et al. Learning to Learn From Noisy Labeled Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Zdenek Kalal,et al. Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[60] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61] Qiang Wang,et al. Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking , 2018, IJCAI.

[62] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[63] Wei Liu,et al. Unsupervised Deep Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Yusuke Uchida,et al. Improving Multi-Person Pose Estimation using Label Correction , 2018, ArXiv.

[65] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[66] Rui Caseiro,et al. High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.