Divide and Conquer the Embedding Space for Metric Learning

Learning the embedding space, where semantically similar objects are located close together and dissimilar objects far apart, is a cornerstone of many computer vision applications. Existing approaches usually learn a single metric in the embedding space for all available data points, which may have a very complex non-uniform distribution with different notions of similarity between objects, e.g. appearance, shape, color or semantic meaning. Approaches for learning a single distance metric often struggle to encode all different types of relationships and do not generalize well. In this work, we propose a novel easy-to-implement divide and conquer approach for deep metric learning, which significantly improves the state-of-the-art performance of metric learning. Our approach utilizes the embedding space more efficiently by jointly splitting the embedding space and data into K smaller sub-problems. It divides both, the data and the embedding space into K subsets and learns K separate distance metrics in the non-overlapping subspaces of the embedding space, defined by groups of neurons in the embedding layer of the neural network. The proposed approach increases the convergence speed and improves generalization since the complexity of each sub-problem is reduced compared to the original one. We show that our approach outperforms the state-of-the-art by a large margin in retrieval, clustering and re-identification tasks on CUB200-2011, CARS196, Stanford Online Products, In-shop Clothes and PKU VehicleID datasets. Source code: https://bit.ly/dcesml.

[1]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Chen Huang,et al.  Unsupervised Learning of Discriminative Attributes and Visual Representations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jinhui Tang,et al.  Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval , 2015, IEEE Transactions on Multimedia.

[4]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[5]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Björn Ommer,et al.  Generative regularization with latent topics for discriminative object recognition , 2015, Pattern Recognit..

[7]  Ling-Yu Duan,et al.  Incorporating intra-class variance to fine-grained visual recognition , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[8]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Jian Guo,et al.  Deep CNN Ensemble with Data Augmentation for Object Detection , 2015, ArXiv.

[10]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[11]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Björn Ommer,et al.  Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning , 2018, ECCV.

[17]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[18]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Kavita Bala,et al.  Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[20]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Weilin Huang,et al.  Deep Metric Learning with Hierarchical Triplet Loss , 2018, ECCV.

[22]  Björn Ommer,et al.  Deep unsupervised learning of visual similarities , 2018, Pattern Recognit..

[23]  Xudong Lin,et al.  Deep Adversarial Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Xiong Chen,et al.  Learning Discriminative Features with Multiple Granularities for Person Re-Identification , 2018, ACM Multimedia.

[25]  Yannis Avrithis,et al.  Mining on Manifolds: Metric Learning Without Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[28]  Xudong Lin,et al.  Deep Variational Metric Learning , 2018, ECCV.

[29]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Robert Pless,et al.  Deep Randomized Ensembles for Metric Learning , 2018, ECCV.

[32]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[33]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Horst Possegger,et al.  BIER — Boosting Independent Embeddings Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[36]  Hongtao Lu,et al.  An Adversarial Approach to Hard Triplet Generation , 2018, ECCV.

[37]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[40]  Tao Mei,et al.  Deep Collaborative Embedding for Social Image Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Xiaogang Wang,et al.  Deep Group-Shuffling Random Walk for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Björn Ommer,et al.  CliqueCNN: Deep Unsupervised Exemplar Learning , 2016, NIPS.

[45]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Chao Zhang,et al.  Hard-Aware Deeply Cascaded Embedding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[48]  M. Saquib Sarfraz,et al.  A Pose-Sensitive Embedding for Person Re-identification with Expanded Cross Neighborhood Re-ranking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Horst Possegger,et al.  Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Nicu Sebe,et al.  Group Consistent Similarity Learning via Deep CRF for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Victor S. Lempitsky,et al.  Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[54]  Björn Ommer,et al.  Deep Unsupervised Similarity Learning Using Partially Ordered Sets , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[56]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Chen Huang,et al.  Local Similarity-Aware Deep Feature Embedding , 2016, NIPS.