Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System

With a large number of video surveillance systems installed for the requirement from industrial security, the task of object tracking, which aims to locate objects of interest in videos, is very important. Although numerous tracking algorithms for RGB videos have been developed in the decade, the tracking performance and robustness of these systems may be degraded dramatically when the information from RGB video is unreliable (e.g., poor illumination conditions or very low resolution). To address this issue, this paper presents a new tracking system, which aims to combine the information from RGB and infrared modalities for object tracking. The proposed tracking systems is based on our proposed machine learning model. Particularly, the learning model can alleviate the modality discrepancy issue under the proposed modality consistency constraint from both representation patterns and discriminability, and generate discriminative feature templates for collaborative representations and discrimination in heterogeneous modalities. Experiments on a variety of challenging RGB-infrared videos demonstrate the effectiveness of the proposed algorithm.

[1]  Xuelong Li,et al.  Robust Visual Tracking Using Structurally Random Projection and Weighted Least Squares , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  David Zhang,et al.  Fast Visual Tracking via Dense Spatio-temporal Context Learning , 2014, ECCV.

[3]  Klaus-Robert Müller,et al.  N-ary decomposition for multi-class classification , 2019, Machine Learning.

[4]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yang Li,et al.  Reliable Patch Trackers: Robust visual tracking by exploiting reliable patches , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pong C. Yuen,et al.  Body Parts Synthesis for Cross-Quality Pose Estimation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Haibin Ling,et al.  Robust Visual Tracking using 1 Minimization , 2009 .

[9]  Ling Shao,et al.  Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval , 2019, IEEE Transactions on Image Processing.

[10]  Xiaodong Yu,et al.  Learning Bidirectional Temporal Cues for Video-Based Person Re-Identification , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Qi Chen,et al.  Long-range terrain perception using convolutional neural networks , 2018, Neurocomputing.

[12]  Chen Chen,et al.  Gabor Convolutional Networks , 2018, WACV.

[13]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[14]  Shengping Zhang,et al.  Modality-correlation-aware sparse representation for RGB-infrared object tracking , 2020, Pattern Recognit. Lett..

[15]  Chuan Chen,et al.  A Semisupervised Classification Approach for Multidomain Networks With Domain Selection , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Fuchun Sun,et al.  Fusion tracking in color and infrared images using joint sparse representation , 2012, Science China Information Sciences.

[17]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Rama Chellappa,et al.  Robust MIL-Based Feature Template Learning for Object Tracking , 2017, AAAI.

[19]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Rama Chellappa,et al.  Learning Common and Feature-Specific Patterns: A Novel Multiple-Sparse-Representation-Based Tracker , 2018, IEEE Transactions on Image Processing.

[21]  Peter H. N. de With,et al.  Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment , 2012, IEEE Transactions on Consumer Electronics.

[22]  Changsheng Xu,et al.  Robust Structural Sparse Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Lei Zhang,et al.  Fast Compressive Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Andrea Cavallaro,et al.  Accepted for Publication in Ieee Transactions on Image Processing Adaptive Appearance Modeling for Video Tracking: Survey and Evaluation , 2022 .

[27]  Wei-Shi Zheng,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Qi Wang,et al.  Multi-cue based tracking , 2014, Neurocomputing.

[29]  Feiping Nie,et al.  Detecting Coherent Groups in Crowd Scenes by Multiview Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[31]  Liang Lin,et al.  Visual Tracking via Dynamic Graph Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jing Liu,et al.  Partially Shared Latent Factor Learning With Multiview Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Jingdong Wang,et al.  Online Robust Non-negative Dictionary Learning for Visual Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Qiang Ni,et al.  Joint Image-Text Hashing for Fast Large-Scale Cross-Media Retrieval Using Self-Supervised Deep Learning , 2019, IEEE Transactions on Industrial Electronics.

[36]  Jason Gu,et al.  A Feature Descriptor Based on Local Normalized Difference for Real-World Texture Classification , 2018, IEEE Transactions on Multimedia.

[37]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Rick Siow Mong Goh,et al.  Transfer Hashing: From Shallow to Deep , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Zheng Wang,et al.  Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing , 2016, IEEE Transactions on Multimedia.

[40]  Pong C. Yuen,et al.  Learning domain-shared group-sparse representation for unsupervised domain adaptation , 2018, Pattern Recognit..

[41]  Larry S. Davis,et al.  Online discriminative dictionary learning for visual tracking , 2014, IEEE Winter Conference on Applications of Computer Vision.

[42]  Pong C. Yuen,et al.  Joint Discriminative Learning of Deep Dynamic Textures for 3D Mask Face Anti-Spoofing , 2019, IEEE Transactions on Information Forensics and Security.

[43]  Thomas S. Huang,et al.  Multi-observation visual recognition via joint dynamic sparse representation , 2011, 2011 International Conference on Computer Vision.

[44]  Riad I. Hammoud,et al.  Pedestrian tracking by fusion of thermal-visible surveillance videos , 2010, Machine Vision and Applications.

[45]  Pong C. Yuen,et al.  Dynamic Graph Co-Matching for Unsupervised Video-Based Person Re-Identification , 2019, IEEE Transactions on Image Processing.

[46]  Tianzhu Zhang,et al.  In Defense of Sparse Tracking: Circulant Sparse Tracker , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Haibin Ling,et al.  Robust visual tracking using ℓ1 minimization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  Shengping Zhang,et al.  Robust Collaborative Discriminative Learning for RGB-Infrared Tracking , 2018, AAAI.

[49]  Bingpeng Ma,et al.  Video-Based Pedestrian Re-Identification by Adaptive Spatio-Temporal Appearance Model , 2017, IEEE Transactions on Image Processing.

[50]  Li Bai,et al.  Multiple source data fusion via sparse representation for robust visual tracking , 2011, 14th International Conference on Information Fusion.

[51]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[52]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[53]  Pong C. Yuen,et al.  Robust Anchor Embedding for Unsupervised Video Person re-IDentification in the Wild , 2018, ECCV.

[54]  Pong C. Yuen,et al.  Feature Constrained by Pixel: Hierarchical Adversarial Deep Domain Adaptation , 2018, ACM Multimedia.

[55]  Qiang Wu,et al.  PageRank Tracker: From Ranking to Tracking , 2014, IEEE Transactions on Cybernetics.

[56]  Guna Seetharaman,et al.  Geodesic Active Contour Based Fusion of Visible and Infrared Video for Persistent Object Tracking , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[57]  Jiwen Lu,et al.  Sharable and Individual Multi-View Metric Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Liang Lin,et al.  Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[59]  Xuelong Li,et al.  A Biologically Inspired Appearance Model for Robust Visual Tracking , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[60]  Pong C. Yuen,et al.  Hierarchical Discriminative Learning for Visible Thermal Person Re-Identification , 2018, AAAI.

[61]  Xuelong Li,et al.  Hierarchical Feature Selection for Random Projection , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[62]  Shengping Zhang,et al.  Robust Joint Discriminative Feature Learning for Visual Tracking , 2016, IJCAI.

[63]  Wai-kuen Cham,et al.  Gradient-Directed Multiexposure Composition , 2012, IEEE Transactions on Image Processing.

[64]  Pong C. Yuen,et al.  Multi-cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Shengping Zhang,et al.  Sparse coding based visual tracking: Review and experimental comparison , 2013, Pattern Recognit..

[66]  Shuicheng Yan,et al.  Robust Object Tracking with Online Multi-lifespan Dictionary Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[67]  Alan F. Smeaton,et al.  Thermo-visual feature fusion for object tracking using multiple spatiogram trackers , 2007 .

[68]  Pong C. Yuen,et al.  Semi-supervised Region Metric Learning for Person Re-identification , 2018, International Journal of Computer Vision.

[69]  Zhongfei Zhang,et al.  A survey of appearance models in visual object tracking , 2013, ACM Trans. Intell. Syst. Technol..

[70]  Jungong Han,et al.  Real-Time Scalable Visual Tracking via Quadrangle Kernelized Correlation Filters , 2018, IEEE Transactions on Intelligent Transportation Systems.

[71]  Xuelong Li,et al.  Spectral Embedded Adaptive Neighbors Clustering , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[72]  Chen Chen,et al.  Output Constraint Transfer for Kernelized Correlation Filter in Tracking , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[73]  Jiwen Lu,et al.  MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[74]  Pong C. Yuen,et al.  Remote Photoplethysmography Correspondence Feature for 3D Mask Face Presentation Attack Detection , 2018, ECCV.

[75]  Larry S. Davis,et al.  Class consistent multi-modal fusion with binary features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).