M5L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking

Classifying the confusing samples in the course of RGBT tracking is a quite challenging problem, which hasn't got satisfied solution. Existing methods only focus on enlarging the boundary between positive and negative samples, however, the structured information of samples might be harmed, e.g., confusing positive samples are closer to the anchor than normal positive samples.To handle this problem, we propose a novel Multi-Modal Multi-Margin Metric Learning framework, named M$^5$L for RGBT tracking in this paper. In particular, we design a multi-margin structured loss to distinguish the confusing samples which play a most critical role in tracking performance boosting. To alleviate this problem, we additionally enlarge the boundaries between confusing positive samples and normal ones, between confusing negative samples and normal ones with predefined margins, by exploiting the structured information of all samples in each modality.Moreover, a cross-modality constraint is employed to reduce the difference between modalities and push positive samples closer to the anchor than negative ones from two modalities.In addition, to achieve quality-aware RGB and thermal feature fusion, we introduce the modality attentions and learn them using a feature fusion module in our network. Extensive experiments on large-scale datasets testify that our framework clearly improves the tracking performance and outperforms the state-of-the-art RGBT trackers.

[1]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jin Tang,et al.  Weighted Sparse Representation Regularized Graph Learning for RGB-T Object Tracking , 2017, ACM Multimedia.

[5]  Chenglong Li,et al.  Multi-Adapter RGBT Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[6]  Gang Hua,et al.  Discriminative Tracking by Metric Learning , 2010, ECCV.

[7]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[8]  Jianbing Shen,et al.  Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[9]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jin Tang,et al.  RGB-T Object Tracking: Benchmark and Baseline , 2018, Pattern Recognit..

[12]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[14]  Hui Cheng,et al.  Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking , 2016, IEEE Transactions on Image Processing.

[15]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[16]  Fahad Shahbaz Khan,et al.  Multi-Modal Fusion for End-to-End RGB-T Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[17]  Jin Tang,et al.  Grayscale-Thermal Object Tracking via Multitask Laplacian Sparse Representation , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaochun Cao,et al.  Fusing two-stream convolutional neural networks for RGB-T object tracking , 2017, Neurocomputing.

[20]  Chenglong Li,et al.  Deep Adaptive Fusion Network for High Performance RGBT Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[21]  Yan Huang,et al.  Cross-Modal Ranking with Soft Consistency and Noisy Labels for Robust RGB-T Tracking , 2018, ECCV.

[22]  Yang Hua,et al.  Ranked List Loss for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jin Tang,et al.  FANet: Quality-Aware Feature Aggregation Network for Robust RGB-T Tracking. , 2018 .

[24]  Bohyung Han,et al.  Real-Time MDNet , 2018, ECCV.

[25]  Jiwen Lu,et al.  Deep Metric Learning for Visual Tracking , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[27]  Liang Lin,et al.  FANet: Quality-Aware Feature Aggregation Network for RGB-T Tracking , 2018, ArXiv.

[28]  Shengping Zhang,et al.  Robust Collaborative Discriminative Learning for RGB-Infrared Tracking , 2018, AAAI.

[29]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Xiao Wang,et al.  Dense Feature Aggregation and Pruning for RGBT Tracking , 2019, ACM Multimedia.

[33]  Yueting Zhuang,et al.  Online Metric-Weighted Linear Representations for Robust Visual Tracking , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.