Synthetic Data Generation for End-to-End Thermal Infrared Tracking

The usage of both off-the-shelf and end-to-end trained deep networks have significantly improved the performance of visual tracking on RGB videos. However, the lack of large labeled datasets hampers the usage of convolutional neural networks for tracking in thermal infrared (TIR) images. Therefore, most state-of-the-art methods on tracking for TIR data are still based on handcrafted features. To address this problem, we propose to use image-to-image translation models. These models allow us to translate the abundantly available labeled RGB data to synthetic TIR data. We explore both the usage of paired and unpaired image translation models for this purpose. These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking. To the best of our knowledge, we are the first to train end-to-end features for TIR tracking. We perform extensive experiments on the VOT-TIR2017 dataset. We show that a network trained on a large dataset of synthetic TIR data obtains better performance than one trained on the available real TIR data. Combining both data sources leads to further improvement. In addition, when we combine the network with motion features, we outperform the state of the art with a relative gain of over 10%, clearly showing the efficiency of using synthetic data to train end-to-end TIR trackers.

[1]  Jiri Matas,et al.  Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[2]  Michael Felsberg,et al.  The Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge Results , 2015, ICCV Workshops.

[3]  Roland Siegwart,et al.  People detection and tracking from aerial thermal views , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Bogdan Raducanu,et al.  Invertible Conditional GANs for image editing , 2016, ArXiv.

[5]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[8]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[9]  Qifeng Yu,et al.  Dense structural learning for infrared object tracking at 200+ Frames per Second , 2017, Pattern Recognit. Lett..

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[12]  Trevor Darrell,et al.  Learning with Side Information through Modality Hallucination , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Guillaume-Alexandre Bilodeau,et al.  An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications , 2012, Comput. Vis. Image Underst..

[16]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[17]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[18]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[19]  Farhad Dadgostar,et al.  Role of Spatiotemporal Oriented Energy Features for Robust Visual Tracking in Video Surveillance , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[20]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[21]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Liwei Liu,et al.  Hand posture recognition using finger geometric feature , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Qifeng Yu,et al.  Online structural learning with dense samples and a weighting kernel , 2017, Pattern Recognit. Lett..

[25]  Bernard Ghanem,et al.  Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Hongdong Li,et al.  Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jiaolong Xu,et al.  Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison , 2016, Sensors.

[28]  Margrit Betke,et al.  A Thermal Infrared Video Benchmark for Visual Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Sergio Escalera,et al.  Multi-modal RGB–Depth–Thermal Human Body Segmentation , 2016, International Journal of Computer Vision.

[30]  Pierre-Luc St-Charles,et al.  Thermal–visible registration of human silhouettes: A similarity measure performance evaluation , 2014 .

[31]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[32]  Bohyung Han,et al.  Modeling and Propagating CNNs in a Tree Structure for Visual Tracking , 2016, ArXiv.

[33]  James W. Davis,et al.  A Two-Stage Template Approach to Person Detection in Thermal Imagery , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[34]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[35]  Thomas B. Moeslund,et al.  Thermal cameras and applications: a survey , 2013, Machine Vision and Applications.

[36]  Guillaume-Alexandre Bilodeau,et al.  Mutual Foreground Segmentation with Multispectral Stereo Pairs , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[37]  Zhenyu He,et al.  The Thermal Infrared Visual Object Tracking VOT-TIR2016 Challenge Results , 2016, ECCV Workshops.

[38]  Jiri Matas,et al.  Discriminative Correlation Filter Tracker with Channel and Spatial Reliability , 2016, International Journal of Computer Vision.

[39]  Jing Lv,et al.  InfAR dataset: Infrared action recognition at different times , 2016, Neurocomputing.

[40]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Duy-Dinh Le,et al.  Visual Analytics of Political Networks From Face-Tracking of News Video , 2016, IEEE Transactions on Multimedia.

[42]  A. Aydın Alatan,et al.  Good Features to Correlate for Visual Tracking , 2017, IEEE Transactions on Image Processing.

[43]  Luis Herranz,et al.  Depth CNNs for RGB-D Scene Recognition: Learning from Scratch Better than Transferring from RGB-CNNs , 2017, AAAI.

[44]  Jin Tang,et al.  Grayscale-Thermal Object Tracking via Multitask Laplacian Sparse Representation , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[45]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[46]  Michael Felsberg,et al.  Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Hui Cheng,et al.  Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking , 2016, IEEE Transactions on Image Processing.

[48]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[49]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Joost van de Weijer,et al.  Mix and Match Networks: Encoder-Decoder Alignment for Zero-Pair Image Translation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Michael Felsberg,et al.  The Visual Object Tracking VOT2017 Challenge Results , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[52]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[53]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Nicu Sebe,et al.  Learning Cross-Modal Deep Representations for Robust Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Qiang Wang,et al.  DCFNet: Discriminant Correlation Filters Network for Visual Tracking , 2017, ArXiv.

[56]  Rynson W. H. Lau,et al.  CREST: Convolutional Residual Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Thomas B. Moeslund,et al.  Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[61]  Namil Kim,et al.  Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[64]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[65]  Alexander C. Berg,et al.  Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers , 2018, ECCV.

[66]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[67]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[69]  Jiri Matas,et al.  A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Simon Lucey,et al.  Multi-channel Correlation Filters , 2013, 2013 IEEE International Conference on Computer Vision.

[71]  Joost van de Weijer,et al.  Image-to-image translation for cross-domain disentanglement , 2018, NeurIPS.

[72]  Ales Leonardis,et al.  Visual Object Tracking Performance Measures Revisited , 2015, IEEE Transactions on Image Processing.

[73]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[74]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.