论文信息 - Adversarial Semi-Supervised Multi-Domain Tracking

Adversarial Semi-Supervised Multi-Domain Tracking

Neural networks for multi-domain learning empowers an effective combination of information from different domains by sharing and co-learning the parameters. In visual tracking, the emerging features in shared layers of a multi-domain tracker, trained on various sequences, are crucial for tracking in unseen videos. Yet, in a fully shared architecture, some of the emerging features are useful only in a specific domain, reducing the generalization of the learned feature representation. We propose a semi-supervised learning scheme to separate domain-invariant and domain-specific features using adversarial learning, to encourage mutual exclusion between them, and to leverage self-supervised learning for enhancing the shared features using the unlabeled reservoir. By employing these features and training dedicated layers for each sequence, we build a tracker that performs exceptionally on different types of videos.

Maryam Sadat Mirzaei | Kourosh Meshgi | Kourosh Meshgi

[1] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[2] Feng Li,et al. Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Xin Pan,et al. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Sergio Guadarrama,et al. Tracking Emerges by Colorizing Videos , 2018, ECCV.

[6] Abhinav Gupta,et al. Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[7] Wei Wu,et al. High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .

[9] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.

[10] Wei Wu,et al. End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Simon Lucey,et al. Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[13] Xin Zhao,et al. GlobalTrack: A Simple and Strong Baseline for Long-term Tracking , 2019, AAAI.

[14] Thomas Wolf,et al. A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks , 2018, AAAI.

[15] Michèle Sebag,et al. Multi-Domain Adversarial Learning , 2019, ICLR.

[16] Marek Rei,et al. Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[17] Richard P. Wildes,et al. Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.

[18] Yi Li,et al. DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking , 2015, IEEE Transactions on Image Processing.

[19] Yaser S. Abu-Mostafa,et al. Learning from hints in neural networks , 1990, J. Complex..

[20] Michael Felsberg,et al. The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[21] David D. Lewis,et al. Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[22] Bohyung Han,et al. Real-Time MDNet , 2018, ECCV.

[23] Yihong Gong,et al. Human Tracking Using Convolutional Neural Networks , 2010, IEEE Transactions on Neural Networks.

[24] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[25] Luc Van Gool,et al. Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Trevor Cohn,et al. Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[27] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[28] Wei Wu,et al. SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Stan Sclaroff,et al. MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[30] Shin Ishii,et al. Active discriminative tracking using collective memory , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).

[31] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[32] Bernard Ghanem,et al. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[33] Guan Huang,et al. UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[34] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35] George Trigeorgis,et al. Domain Separation Networks , 2016, NIPS.

[36] Zhe Chen,et al. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[38] Michael Felsberg,et al. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[39] Bohyung Han,et al. Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Fan Yang,et al. Efficient Correlation Tracking via Center-Biased Spatial Regularization , 2018, IEEE Transactions on Image Processing.

[41] Ming-Hsuan Yang,et al. Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Xuanjing Huang,et al. Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[43] Arnold W. M. Smeulders,et al. UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[44] Yi Wu,et al. Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Michael Felsberg,et al. ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Zdenek Kalal,et al. Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Andrew Zisserman,et al. What have We Learned from Deep Representations for Action Recognition? , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Jitendra Malik,et al. Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Lei Xie,et al. An ensemble of deep neural networks for object tracking , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[50] Wenbing Tao,et al. Once for All: A Two-Flow Convolutional Neural Network for Visual Tracking , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[51] Anders Søgaard,et al. Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[52] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[53] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54] Dit-Yan Yeung,et al. Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[55] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[56] Sinno Jialin Pan,et al. Distributed Multi-Task Relationship Learning , 2017, KDD.

[57] Wei Wu,et al. Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[58] Ming-Hsuan Yang,et al. Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59] Rich Caruana,et al. Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[60] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[61] Xuanjing Huang,et al. Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[62] Bin Luo,et al. Learning Target-aware Attention for Robust Tracking with Conditional Adversarial Network , 2019, BMVC.

[63] Joachim Bingel,et al. Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.

[64] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[65] Jianfei Yu,et al. Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification , 2016, EMNLP.

[66] Yoshimasa Tsuruoka,et al. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[67] Fan Yang,et al. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[69] Qingshan Liu,et al. Robust Visual Tracking via Convolutional Networks Without Training , 2015, IEEE Transactions on Image Processing.

[70] Vibhav Vineet,et al. Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[72] Leonidas J. Guibas,et al. Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73] Luca Bertinetto,et al. Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Xin Zhao,et al. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75] Xuanjing Huang,et al. Deep Multi-Task Learning with Shared Memory for Text Classification , 2016, EMNLP.

[76] Rynson W. H. Lau,et al. VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[78] James E. Baker,et al. Reducing Bias and Inefficienry in the Selection Algorithm , 1987, ICGA.

[79] Michael Felsberg,et al. Deep motion features for visual tracking , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[80] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[81] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[82] Trevor Darrell,et al. Factorized Latent Spaces with Structured Sparsity , 2010, NIPS.

[83] Andrew Zisserman,et al. Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[84] Yongxin Yang,et al. Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[85] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[88] Alexander Kolesnikov,et al. Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[89] Michael Felsberg,et al. Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[90] Tomaso A. Poggio,et al. Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[91] Thomas Brox,et al. Object Detection, Tracking, and Motion Segmentation for Object-level Video Segmentation , 2016, ArXiv.

[92] Cordelia Schmid,et al. Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93] Jin Gao,et al. Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[94] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[95] Andrew Zisserman,et al. Learning and Using the Arrow of Time , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[96] Michael Felsberg,et al. Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[97] Michael Felsberg,et al. ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98] Rahul Sukthankar,et al. Rethinking the Faster R-CNN Architecture for Temporal Action Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[99] Bernard Ghanem,et al. A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[100] Xiao Wang,et al. SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[101] Bing Liu,et al. STResNet_CF Tracker: The Deep Spatiotemporal Features Learning for Correlation Filter Based Robust Visual Object Tracking , 2019, IEEE Access.