Deep Bregman Divergence for Contrastive Learning of Visual Representations

Deep Bregman divergence measures divergence of data points using neural networks which is beyond Euclidean distance and capable of capturing divergence over distributions. In this paper, we propose deep Bregman divergences for contrastive learning of visual representation where we aim to enhance contrastive loss used in self-supervised learning by training additional networks based on functional Bregman divergence. In contrast to the conventional contrastive learning methods which are solely based on divergences between single points, our framework can capture the divergence between distributions which improves the quality of learned representation. We show the combination of conventional contrastive loss and our proposed divergence loss outperforms baseline and most of the previous methods for self-supervised and semi-supervised learning on multiple classifications and object detection tasks and datasets. Moreover, the learned representations generalize well when transferred to the other datasets and tasks. The source code and our models are available in supplementary and will be released with paper.

[1]  Gregory W. Wornell,et al.  Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss , 2020, IEEE Transactions on Information Theory.

[2]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alexander Kolesnikov,et al.  Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Venkatesh Saligrama,et al.  Learning to Approximate a Bregman Divergence , 2020, NeurIPS.

[5]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[6]  Jean Ponce,et al.  VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ArXiv.

[7]  Evgeni Y. Ovcharov,et al.  Proper Scoring Rules and Bregman Divergences , 2015, 1502.01178.

[8]  Masahiro Kato,et al.  Non-Negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation , 2020, ICML.

[9]  Hadrien Hendrikx,et al.  Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction , 2021, ICML.

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[12]  Maya R. Gupta,et al.  Functional Bregman divergence , 2008, 2008 IEEE International Symposium on Information Theory.

[13]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Arthur Gretton,et al.  Self-Supervised Learning with Kernel Dependence Maximization , 2021, NeurIPS.

[15]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[16]  Brian Kulis,et al.  Deep Divergence Learning , 2020, ICML.

[17]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[18]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[22]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Pinar Yanardag,et al.  LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[25]  Walid Krichene,et al.  Rankmax: An Adaptive Projection Alternative to the Softmax Function , 2020, NeurIPS.

[26]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[27]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[28]  Ting Chen,et al.  Intriguing Properties of Contrastive Losses , 2020, NeurIPS.

[29]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Mikhail Belkin,et al.  Clustering with Bregman Divergences: an Asymptotic Analysis , 2016, NIPS.

[31]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[32]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[33]  Michael Kampffmeyer,et al.  Deep Divergence-Based Approach to Clustering , 2019, Neural Networks.

[34]  Nicu Sebe,et al.  Whitening for Self-Supervised Representation Learning , 2020, ICML.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[37]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[41]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[42]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[44]  Ching-Yao Chuang,et al.  Contrastive Learning with Hard Negative Samples , 2020, ArXiv.

[45]  Noel C. F. Codella,et al.  Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC) , 2019, ArXiv.

[46]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[47]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[48]  Ming-Hsuan Yang,et al.  Unsupervised Representation Learning by Sorting Sequences , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[50]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[51]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[52]  Eric P. Xing,et al.  Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning , 2020, 2003.05438.

[53]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[54]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[55]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Vladimir Risojevic,et al.  Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).