Unsupervised Deep Metric Learning via Auxiliary Rotation Loss

Deep metric learning is an important area due to its applicability to many domains such as image retrieval and person re-identification. The main drawback of such models is the necessity for labeled data. In this work, we propose to generate pseudo-labels for deep metric learning directly from clustering assignment and we introduce unsupervised deep metric learning (UDML) regularized by a self-supervision (SS) task. In particular, we propose to regularize the training process by predicting image rotations. Our method (UDML-SS) jointly learns discriminative embeddings, unsupervised clustering assignments of the embeddings, as well as a self-supervised pretext task. UDML-SS iteratively cluster embeddings using traditional clustering algorithm (e.g., k-means), and sampling training pairs based on the cluster assignment for metric learning, while optimizing self-supervised pretext task in a multi-task fashion. The role of self-supervision is to stabilize the training process and encourages the model to learn meaningful feature representations that are not distorted due to unreliable clustering assignments. The proposed method performs well on standard benchmarks for metric learning, where it outperforms current state-of-the-art approaches by a large margin and it also shows competitive performance with various metric learning loss functions.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Dacheng Tao,et al.  Correcting the Triplet Selection Bias for Triplet Loss , 2018, ECCV.

[4]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Anurag Mittal,et al.  A Zero-Shot Framework for Sketch-based Image Retrieval , 2018, ECCV.

[6]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hinrich Schütze,et al.  Introduction to Information Retrieval: XML retrieval , 2008 .

[8]  Björn Ommer,et al.  Deep Unsupervised Similarity Learning Using Partially Ordered Sets , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[10]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Rui Yu,et al.  Hard-Aware Point-to-Set Deep Metric for Person Re-identification , 2018, ECCV.

[12]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[13]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[14]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[15]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Frédéric Jurie,et al.  Hard Negative Mining for Metric Learning Based Zero-Shot Classification , 2016, ECCV Workshops.

[18]  Shih-Fu Chang,et al.  Unsupervised Embedding Learning via Invariant and Spreading Instance Feature , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[20]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[21]  Vincent Lepetit,et al.  3D Pose Estimation and 3D Model Retrieval for Objects in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[23]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[24]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[26]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[28]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Xiang Yu,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[31]  Dacheng Tao,et al.  Self-Supervised Representation Learning by Rotation Feature Decoupling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jiwen Lu,et al.  Deep Metric Learning for Visual Tracking , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Frédéric Jurie,et al.  Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication , 2016, ECCV.

[35]  Gregory Shakhnarovich,et al.  Learning Representations for Automatic Colorization , 2016, ECCV.

[36]  Julien Mairal,et al.  Unsupervised Pre-Training of Image Features on Non-Curated Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Yannis Avrithis,et al.  Mining on Manifolds: Metric Learning Without Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[41]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[42]  Ming-Hsuan Yang,et al.  Unsupervised Representation Learning by Sorting Sequences , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Preface , 2008 .

[44]  Weilin Huang,et al.  Deep Metric Learning with Hierarchical Triplet Loss , 2018, ECCV.

[45]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[46]  Sergey Levine,et al.  Grasp2Vec: Learning Object Representations from Self-Supervised Grasping , 2018, CoRL.

[47]  Jiwen Lu,et al.  Deep Adversarial Metric Learning , 2020, IEEE Transactions on Image Processing.

[48]  Björn Ommer,et al.  CliqueCNN: Deep Unsupervised Exemplar Learning , 2016, NIPS.

[49]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Jiwen Lu,et al.  Hardness-Aware Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[53]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Victor S. Lempitsky,et al.  Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[55]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[56]  Timnit Gebru,et al.  Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[58]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.