Hierarchical Proxy-based Loss for Deep Metric Learning

Proxy-based metric learning losses are superior to pairbased losses due to their fast convergence and low training complexity. However, existing proxy-based losses focus on learning class-discriminative features while overlooking the commonalities shared across classes which are potentially useful in describing and matching samples. Moreover, they ignore the implicit hierarchy of categories in realworld datasets, where similar subordinate classes can be grouped together. In this paper, we present a framework that leverages this implicit hierarchy by imposing a hierarchical structure on the proxies and can be used with any existing proxy-based loss. This allows our model to capture both class-discriminative features and class-shared characteristics without breaking the implicit data hierarchy. We evaluate our method on five established image retrieval datasets such as In-Shop and SOP. Results demonstrate that our hierarchical proxy-based loss framework improves the performance of existing proxy-based losses, especially on large datasets which exhibit strong hierarchical structure.

[1]  Weilin Huang,et al.  Cross-Batch Memory for Embedding Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ser-Nam Lim,et al.  A Metric Learning Reality Check , 2020, ECCV.

[3]  Andrew Zisserman,et al.  Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval , 2020, ECCV.

[4]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Zuozhuo Dai,et al.  Batch DropBlock Network for Person Re-Identification and Beyond , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jan-Michael Frahm,et al.  Hierarchy of Alternating Specialists for Scene Recognition , 2018, ECCV.

[9]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Nuno Vasconcelos,et al.  PIEs: Pose Invariant Embeddings , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Karsten Roth,et al.  MIC: Mining Interclass Characteristics for Improved Metric Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Graham W. Taylor,et al.  ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis , 2020, ECCV.

[15]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[17]  Björn Ommer,et al.  Divide and Conquer the Embedding Space for Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Dorin Comaniciu,et al.  Deep Decision Network for Multi-class Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Joseph Paul Cohen,et al.  DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning , 2020, ECCV.

[20]  Weilin Huang,et al.  Deep Metric Learning with Hierarchical Triplet Loss , 2018, ECCV.

[21]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[22]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Marcello Pelillo,et al.  The Group Loss for Deep Metric Learning , 2020, ECCV.

[24]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[25]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Miin-Shen Yang,et al.  A robust EM clustering algorithm for Gaussian mixture models , 2012, Pattern Recognit..

[29]  Haibo Wang,et al.  Self-supervising Fine-grained Region Similarities for Large-scale Image Localization , 2020, ECCV.

[30]  Pablo Piantanida,et al.  A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses , 2020, ECCV.

[31]  Trevor Darrell,et al.  Reducing Class Collapse in Metric Learning with Easy Positive Sampling , 2020, ArXiv.

[32]  Qi Qian,et al.  SoftTriple Loss: Deep Metric Learning Without Triplet Sampling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[34]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[36]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Chen Change Loy,et al.  Online Deep Clustering for Unsupervised Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Sinisa Todorovic,et al.  Ensemble Deep Manifold Similarity Learning Using Hard Proxies , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[41]  Suha Kwak,et al.  Proxy Anchor Loss for Deep Metric Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[43]  Horst Possegger,et al.  HiBsteR: Hierarchical Boosted Deep Metric Learning for Image Retrieval , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).