HiBsteR: Hierarchical Boosted Deep Metric Learning for Image Retrieval

When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.

[1]  Victor S. Lempitsky,et al.  Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lei Wang,et al.  Positive Semidefinite Metric Learning with Boosting , 2009, NIPS.

[8]  Frédéric Jurie,et al.  Boosted Metric Learning for Efficient Identity-Based Face Retrieval , 2015, BMVC.

[9]  Lei Wang,et al.  PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning , 2008, NIPS.

[10]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[12]  Kristen Grauman,et al.  Learning a Tree of Metrics with Disjoint Visual Features , 2011, NIPS.

[13]  Georg Waltner,et al.  Personalized Dietary Self-Management Using Mobile Vision-Based Assistance , 2017, ICIAP Workshops.

[14]  Matthieu Cord,et al.  Quadruplet-Wise Image Similarity Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Patrick Pérez,et al.  Some Faces are More Equal than Others: Hierarchical Organization for Accurate and Efficient Large-Scale Identity-Based Face Retrieval , 2014, ECCV Workshops.

[17]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[18]  Chao Zhang,et al.  Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Horst Possegger,et al.  Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Xiang Yu,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Vinod Nair,et al.  Learning hierarchical similarity metrics , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zhiqiang Shen,et al.  Multiple Granularity Descriptors for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[29]  Horst Possegger,et al.  BIER — Boosting Independent Embeddings Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[31]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[32]  Raquel Urtasun,et al.  Deep Spectral Clustering Learning , 2017, ICML.

[33]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[34]  Baba C. Vemuri,et al.  A Robust and Efficient Doubly Regularized Metric Learning Approach , 2012, ECCV.

[35]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[36]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[38]  Zilei Wang,et al.  VegFru: A Domain-Specific Dataset for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[40]  Jinbo Bi,et al.  AdaBoost on low-rank PSD matrices for metric learning , 2011, CVPR 2011.