Keypoint-Aligned Embeddings for Image Retrieval and Re-identification

Learning embeddings that are invariant to the pose of the object is crucial in visual image retrieval and re-identification. The existing approaches for person, vehicle, or animal re-identification tasks suffer from high intra-class variance due to deformable shapes and different camera viewpoints. To overcome this limitation, we propose to align the image embedding with a predefined order of the keypoints. The proposed keypoint aligned embeddings model (KAE-Net) learns part-level features via multi-task learning which is guided by keypoint locations. More specifically, KAE-Net extracts channels from a feature map activated by a specific keypoint through learning the auxiliary task of heatmap reconstruction for this keypoint. The KAE-Net is compact, generic and conceptually simple. It achieves state of the art performance on the benchmark datasets of CUB-200-2011, Cars196 and VeRi-776 for retrieval and re-identification tasks.

[1]  Robert Pless,et al.  Improved Embeddings with Easy Positive Triplet Mining , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Hao-Yu Wu,et al.  Classification is a Strong Baseline for Deep Metric Learning , 2018, BMVC.

[4]  Yuxin Peng,et al.  Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[5]  Frédéric Maire,et al.  Learning Landmark Guided Embeddings for Animal Re-identification , 2020, 2020 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[6]  Jian Yang,et al.  Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  M. Saquib Sarfraz,et al.  A Pose-Sensitive Embedding for Person Re-identification with Expanded Cross Neighborhood Re-ranking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Shuo Wang,et al.  PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[10]  Rama Chellappa,et al.  A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Zhixin Wang,et al.  Part-Aware Fine-Grained Object Categorization Using Weakly Supervised Part Detection Network , 2018, IEEE Transactions on Multimedia.

[12]  Ryan Farrell,et al.  Aligned to the Object, Not to the Image: A Unified Pose-Aligned Representation for Fine-Grained Recognition , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Dacheng Tao,et al.  Multi-task Learning with Coarse Priors for Robust Part-aware Person Re-identification , 2020, IEEE transactions on pattern analysis and machine intelligence.

[14]  Hantao Yao,et al.  Deep Representation Learning With Part Loss for Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Farzin Aghdasi,et al.  Vehicle Re-identification: an Efficient Baseline Using Triplet Embedding , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[18]  Wu Liu,et al.  Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[19]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[22]  Qi Qian,et al.  SoftTriple Loss: Deep Metric Learning Without Triplet Sampling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Qiang Ji,et al.  Facial Landmark Detection: A Literature Survey , 2018, International Journal of Computer Vision.

[24]  Sultan Daud Khan,et al.  A survey of advances in vision-based vehicle re-identification , 2019, Comput. Vis. Image Underst..

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Aymeric Histace,et al.  Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[28]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[29]  Shiliang Zhang,et al.  Pose-Driven Deep Convolutional Model for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Alex Bewley,et al.  Deep Cosine Metric Learning for Person Re-identification , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Tao Mei,et al.  PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance , 2018, IEEE Transactions on Multimedia.

[33]  Yichen Wei,et al.  Vehicle Re-Identification With Viewpoint-Aware Metric Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Wei Jiang,et al.  Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Xiu-Shen Wei,et al.  Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[36]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[37]  Xiaogang Wang,et al.  Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Hasan Şakir Bilge,et al.  Deep Metric Learning: A Survey , 2019, Symmetry.

[39]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei Jiang,et al.  Bags of Tricks and A Strong Baseline for Deep Person Re-identification. , 2019 .

[41]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.