From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

We introduce a conceptually simple yet effective model for self-supervised representation learning with graph data. It follows the previous methods that generate two views of an input graph through data augmentation. However, unlike contrastive methods that focus on instance-level discrimination, we optimize an innovative feature-level objective inspired by classical Canonical Correlation Analysis. Compared with other works, our approach requires none of the parameterized mutual information estimator, additional projector, asymmetric structures, and most importantly, negative samples which can be costly. We show that the new objective essentially 1) aims at discarding augmentation-variant information by learning invariant representations, and 2) can prevent degenerated solutions by decorrelating features in different dimensions. Our theoretical analysis further provides an understanding for the new objective which can be equivalently seen as an instantiation of the Information Bottleneck Principle under the self-supervised setting. Despite its simplicity, our method performs competitively on seven public graph datasets. The code is available at: https://github.com/hengruizhang98/CCA-SSG.

[1]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[2]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[3]  Yuandong Tian,et al.  Understanding self-supervised Learning Dynamics without Contrastive Pairs , 2021, ICML.

[4]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[5]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[6]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[7]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[8]  Pengtao Xie,et al.  Contrastive Self-supervised Learning for Graph Classification , 2020, AAAI.

[9]  Sham M. Kakade,et al.  An Information Theoretic Framework for Multi-view Learning , 2008, COLT.

[10]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[11]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[12]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[13]  Dezhong Peng,et al.  Contrastive Clustering , 2021, AAAI.

[14]  Zeynep Akata,et al.  Learning Robust Representations via Multi-View Information Bottleneck , 2020, ICLR.

[15]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[16]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[17]  Ruslan Salakhutdinov,et al.  Self-supervised Learning from a Multi-view Perspective , 2020, ICLR.

[18]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[19]  Qiang Liu,et al.  Graph Contrastive Learning with Adaptive Augmentation , 2020, WWW.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Wenxiao Wang,et al.  On Feature Decorrelation in Self-Supervised Learning , 2021, ArXiv.

[22]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[23]  Rana Ali Amjad,et al.  Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Tao Xiang,et al.  Scalable and Effective Deep CCA via Soft Decorrelation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[26]  Ye Li,et al.  Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , 2020, ArXiv.

[27]  Yuxiao Dong,et al.  GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training , 2020, KDD.

[28]  G. Golub,et al.  The canonical correlations of matrix pairs and their numerical computation , 1992 .

[29]  David J. Schwab,et al.  The Deterministic Information Bottleneck , 2015, Neural Computation.

[30]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[31]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[33]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[34]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[35]  E. Xing,et al.  Iterative Graph Self-Distillation , 2020, IEEE Transactions on Knowledge and Data Engineering.

[36]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[37]  Lise Getoor,et al.  Query-driven Active Surveying for Collective Classification , 2012 .

[38]  Xinlei Chen,et al.  Understanding Self-supervised Learning with Dual Deep Networks , 2020, ArXiv.

[39]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[40]  Kaveh Hassani,et al.  Contrastive Multi-View Representation Learning on Graphs , 2020, ICML.

[41]  Jian Tang,et al.  InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization , 2019, ICLR.

[42]  Qiang Liu,et al.  Deep Graph Contrastive Representation Learning , 2020, ArXiv.

[43]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[44]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[45]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[46]  Minnan Luo,et al.  Graph Representation Learning via Graphical Mutual Information Maximization , 2020, WWW.

[47]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[48]  Stephan Günnemann,et al.  Diffusion Improves Graph Learning , 2019, NeurIPS.

[49]  Nicu Sebe,et al.  Whitening for Self-Supervised Representation Learning , 2020, ICML.

[50]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[53]  Bingbing Ni,et al.  Self-supervised Graph-level Representation Learning with Local and Global Structure , 2021, ICML.

[54]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[55]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[56]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Michal Valko,et al.  Bootstrapped Representation Learning on Graphs , 2021, ArXiv.