Tracking the Known and the Unknown by Leveraging Semantic Information

Current research in visual tracking is largely focused on the generic case, where no prior knowledge about the target object is assumed. However, many real-world tracking applications stem from specific scenarios where the class or type of object is known. In this work, we propose a tracking framework that can exploit this semantic information, without sacrificing the generic nature of the tracker. In addition to the target-specific appearance, we model the class of the object through a semantic module that provides complementary class-specific predictions. By further integrating a semantic classification module, we can utilize the learned class-specific models even if the target class is unknown. Our unified tracking architecture is trained end-to-end on large scale tracking datasets by exploiting the available semantic metadata. Comprehensive experiments are performed on five tracking benchmarks. Our approach achieves state-of-the-art performance while operating at real-time frame-rates. The code and the trained models are available at https://tracking.vision.ee.ethz.ch/track-known-unknown/.

[1]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[3]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Michael J. Black,et al.  Optical Flow with Semantic Segmentation and Localized Layers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Michael Felsberg,et al.  Discriminative Scale Space Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[13]  Michael J. Black,et al.  Optical Flow in Mostly Rigid Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Huchuan Lu,et al.  Correlation Tracking via Joint Discrimination and Reliability Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Qiang Wang,et al.  Tracking-by-Fusion via Gaussian Process Regression Extended to Transfer Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Michael Felsberg,et al.  Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[23]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[24]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jiri Matas,et al.  Discriminative Correlation Filter Tracker with Channel and Spatial Reliability , 2016, International Journal of Computer Vision.

[28]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[29]  Ming-Hsuan Yang,et al.  Structural Constraint Data Association for Online Multi-object Tracking , 2018, International Journal of Computer Vision.

[30]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[31]  Marc Pollefeys,et al.  Discrete optimization of ray potentials for semantic 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[37]  Josef Kittler,et al.  Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking , 2018, IEEE Transactions on Image Processing.

[38]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[39]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[40]  Simon Lucey,et al.  Need for Speed: A Benchmark for Higher Frame Rate Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Kostia Robert Night-Time Traffic Surveillance: A Robust Framework for Multi-vehicle Detection, Classification and Tracking , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.