论文信息 - Keypoint-Aligned Embeddings for Image Retrieval and Re-identification

Keypoint-Aligned Embeddings for Image Retrieval and Re-identification

Learning embeddings that are invariant to the pose of the object is crucial in visual image retrieval and re-identification. The existing approaches for person, vehicle, or animal re-identification tasks suffer from high intra-class variance due to deformable shapes and different camera viewpoints. To overcome this limitation, we propose to align the image embedding with a predefined order of the keypoints. The proposed keypoint aligned embeddings model (KAE-Net) learns part-level features via multi-task learning which is guided by keypoint locations. More specifically, KAE-Net extracts channels from a feature map activated by a specific keypoint through learning the auxiliary task of heatmap reconstruction for this keypoint. The KAE-Net is compact, generic and conceptually simple. It achieves state of the art performance on the benchmark datasets of CUB-200-2011, Cars196 and VeRi-776 for retrieval and re-identification tasks.

[1] Robert Pless,et al. Improved Embeddings with Easy Positive Triplet Mining , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3] Hao-Yu Wu,et al. Classification is a Strong Baseline for Deep Metric Learning , 2018, BMVC.

[4] Yuxin Peng,et al. Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[5] Frédéric Maire,et al. Learning Landmark Guided Embeddings for Animal Re-identification , 2020, 2020 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[6] Jian Yang,et al. Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] M. Saquib Sarfraz,et al. A Pose-Sensitive Embedding for Person Re-identification with Expanded Cross Neighborhood Re-ranking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Shuo Wang,et al. PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Lucas Beyer,et al. In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[10] Rama Chellappa,et al. A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Zhixin Wang,et al. Part-Aware Fine-Grained Object Categorization Using Weakly Supervised Part Detection Network , 2018, IEEE Transactions on Multimedia.

[12] Ryan Farrell,et al. Aligned to the Object, Not to the Image: A Unified Pose-Aligned Representation for Fine-Grained Recognition , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13] Dacheng Tao,et al. Multi-task Learning with Coarse Priors for Robust Part-aware Person Re-identification , 2020, IEEE transactions on pattern analysis and machine intelligence.

[14] Hantao Yao,et al. Deep Representation Learning With Part Loss for Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16] Shaogang Gong,et al. Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Farzin Aghdasi,et al. Vehicle Re-identification: an Efficient Baseline Using Triplet Embedding , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[18] Wu Liu,et al. Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[19] Jian Wang,et al. Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] Alexander J. Smola,et al. Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Qiang Chen,et al. Network In Network , 2013, ICLR.

[22] Qi Qian,et al. SoftTriple Loss: Deep Metric Learning Without Triplet Sampling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Qiang Ji,et al. Facial Landmark Detection: A Literature Survey , 2018, International Journal of Computer Vision.

[24] Sultan Daud Khan,et al. A survey of advances in vision-based vehicle re-identification , 2019, Comput. Vis. Image Underst..

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Aymeric Histace,et al. Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Pietro Perona,et al. Caltech-UCSD Birds 200 , 2010 .

[28] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[29] Shiliang Zhang,et al. Pose-Driven Deep Convolutional Model for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30] Dong Liu,et al. Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Alex Bewley,et al. Deep Cosine Metric Learning for Person Re-identification , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32] Tao Mei,et al. PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance , 2018, IEEE Transactions on Multimedia.

[33] Yichen Wei,et al. Vehicle Re-Identification With Viewpoint-Aware Metric Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34] Wei Jiang,et al. Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35] Xiu-Shen Wei,et al. Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[36] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[37] Xiaogang Wang,et al. Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38] Hasan Şakir Bilge,et al. Deep Metric Learning: A Survey , 2019, Symmetry.

[39] Matthew R. Scott,et al. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Wei Jiang,et al. Bags of Tricks and A Strong Baseline for Deep Person Re-identification. , 2019 .

[41] Yichen Wei,et al. Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.