Adversarial Semi-Supervised Multi-Domain Tracking

Neural networks for multi-domain learning empowers an effective combination of information from different domains by sharing and co-learning the parameters. In visual tracking, the emerging features in shared layers of a multi-domain tracker, trained on various sequences, are crucial for tracking in unseen videos. Yet, in a fully shared architecture, some of the emerging features are useful only in a specific domain, reducing the generalization of the learned feature representation. We propose a semi-supervised learning scheme to separate domain-invariant and domain-specific features using adversarial learning, to encourage mutual exclusion between them, and to leverage self-supervised learning for enhancing the shared features using the unlabeled reservoir. By employing these features and training dedicated layers for each sequence, we build a tracker that performs exceptionally on different types of videos.

[1]  Martial Hebert,et al.  Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[2]  Feng Li,et al.  Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Sergio Guadarrama,et al.  Tracking Emerges by Colorizing Videos , 2018, ECCV.

[6]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[7]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[9]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[10]  Wei Wu,et al.  End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[13]  Xin Zhao,et al.  GlobalTrack: A Simple and Strong Baseline for Long-term Tracking , 2019, AAAI.

[14]  Thomas Wolf,et al.  A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks , 2018, AAAI.

[15]  Michèle Sebag,et al.  Multi-Domain Adversarial Learning , 2019, ICLR.

[16]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[17]  Richard P. Wildes,et al.  Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.

[18]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking , 2015, IEEE Transactions on Image Processing.

[19]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[20]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[21]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[22]  Bohyung Han,et al.  Real-Time MDNet , 2018, ECCV.

[23]  Yihong Gong,et al.  Human Tracking Using Convolutional Neural Networks , 2010, IEEE Transactions on Neural Networks.

[24]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[25]  Luc Van Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[27]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[28]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[30]  Shin Ishii,et al.  Active discriminative tracking using collective memory , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).

[31]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[32]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[33]  Guan Huang,et al.  UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[34]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[36]  Zhe Chen,et al.  MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[38]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[39]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Fan Yang,et al.  Efficient Correlation Tracking via Center-Biased Spatial Regularization , 2018, IEEE Transactions on Image Processing.

[41]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[43]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[44]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Andrew Zisserman,et al.  What have We Learned from Deep Representations for Action Recognition? , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Lei Xie,et al.  An ensemble of deep neural networks for object tracking , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[50]  Wenbing Tao,et al.  Once for All: A Two-Flow Convolutional Neural Network for Visual Tracking , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[52]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[53]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[55]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[56]  Sinno Jialin Pan,et al.  Distributed Multi-Task Relationship Learning , 2017, KDD.

[57]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[58]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[60]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[61]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[62]  Bin Luo,et al.  Learning Target-aware Attention for Robust Tracking with Conditional Adversarial Network , 2019, BMVC.

[63]  Joachim Bingel,et al.  Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.

[64]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[65]  Jianfei Yu,et al.  Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification , 2016, EMNLP.

[66]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[67]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[69]  Qingshan Liu,et al.  Robust Visual Tracking via Convolutional Networks Without Training , 2015, IEEE Transactions on Image Processing.

[70]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[72]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Xuanjing Huang,et al.  Deep Multi-Task Learning with Shared Memory for Text Classification , 2016, EMNLP.

[76]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[78]  James E. Baker,et al.  Reducing Bias and Inefficienry in the Selection Algorithm , 1987, ICGA.

[79]  Michael Felsberg,et al.  Deep motion features for visual tracking , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[80]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[81]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Trevor Darrell,et al.  Factorized Latent Spaces with Structured Sparsity , 2010, NIPS.

[83]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[84]  Yongxin Yang,et al.  Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[85]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[88]  Alexander Kolesnikov,et al.  Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[90]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[91]  Thomas Brox,et al.  Object Detection, Tracking, and Motion Segmentation for Object-level Video Segmentation , 2016, ArXiv.

[92]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Jin Gao,et al.  Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[94]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[95]  Andrew Zisserman,et al.  Learning and Using the Arrow of Time , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[96]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[97]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Rahul Sukthankar,et al.  Rethinking the Faster R-CNN Architecture for Temporal Action Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[99]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[100]  Xiao Wang,et al.  SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[101]  Bing Liu,et al.  STResNet_CF Tracker: The Deep Spatiotemporal Features Learning for Correlation Filter Based Robust Visual Object Tracking , 2019, IEEE Access.