Conditional Contrastive Learning: Removing Undesirable Information in Self-Supervised Representations

Self-supervised learning is a form of unsupervised learning that leverages rich information in data to learn representations. However, data sometimes contains certain information that may be undesirable for downstream tasks. For instance, gender information may lead to biased decisions on many gender-irrelevant tasks. In this paper, we develop conditional contrastive learning to remove undesirable information in self-supervised representations. To remove the effect of the undesirable variable, our proposed approach conditions on the undesirable variable (i.e., by fixing the variations of it) during the contrastive learning process. In particular, inspired by the contrastive objective InfoNCE, we introduce Conditional InfoNCE (C-InfoNCE), and its computationally efficient variant, Weak-Conditional InfoNCE (WeaC-InfoNCE), for conditional contrastive learning. We demonstrate empirically that our methods can successfully learn self-supervised representations for downstream tasks while removing a great level of information related to the undesirable variables. We study three scenarios, each with a different type of undesirable variables: task-irrelevant meta-information for self-supervised speech representation learning, sensitive attributes for fair representation learning, and domain specification for multi-domain visual representation learning.

[1]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[2]  Kush R. Varshney,et al.  Optimized Data Pre-Processing for Discrimination Prevention , 2017, ArXiv.

[3]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Makoto Yamada,et al.  Neural Methods for Point-wise Dependency Estimation , 2020, NeurIPS.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[7]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Geoff Gordon,et al.  Inherent Tradeoffs in Learning Fair Representations , 2019, NeurIPS.

[10]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[11]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[13]  Himanshu Asnani,et al.  CCMI : Classifier based Conditional Mutual Information Estimation , 2019, UAI.

[14]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[15]  Gabriela Csurka,et al.  Deep Visual Domain Adaptation , 2020, 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC).

[16]  Armand Joulin,et al.  Unsupervised Pretraining Transfers Well Across Languages , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[18]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[19]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2020, ICLR.

[20]  Ruslan Salakhutdinov,et al.  Self-supervised Learning from a Multi-view Perspective , 2020, ICLR.

[21]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[22]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[23]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[24]  Stefano Ermon,et al.  Learning Controllable Fair Representations , 2018, AISTATS.

[25]  Akshay Krishnamurthy,et al.  Contrastive learning, multi-view redundancy, and linear models , 2020, ALT.

[26]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[29]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[31]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[32]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[33]  Fuzhen Zhuang,et al.  Supervised Representation Learning: Transfer Learning with Deep Autoencoders , 2015, IJCAI.

[34]  Ruslan Salakhutdinov,et al.  A Note on Connecting Barlow Twins with Negative-Sample-Free Contrastive Learning , 2021, ArXiv.

[35]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[37]  Yang You,et al.  Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[38]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[39]  Cheng Soon Ong,et al.  Costs and Benefits of Fair Representation Learning , 2019, AIES.

[40]  J. Daudin Partial association measures and an application to qualitative regression , 1980 .

[41]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[42]  Toniann Pitassi,et al.  Learning Adversarially Fair and Transferable Representations , 2018, ICML.

[43]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[44]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[45]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[46]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[47]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[48]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[49]  Ruslan Salakhutdinov,et al.  Self-supervised Representation Learning with Relative Predictive Coding , 2021, ICLR.

[50]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.