Improving Graph Representation Learning by Contrastive Regularization

Graph representation learning is an important task with applications in various areas such as online social networks, e-commerce networks,WWWand semantic webs. For unsupervised graph representation learning, many algorithms such as Node2Vec and GraphSAGE make use of “negative sampling” and/or noise contrastive estimation loss. This bears similar ideas to contrastive learning, which “contrasts” the node representation similarities of semantically similar (positive) pairs against those of negative pairs. However, despite the success of contrastive learning, we found that directly applying this technique to graph representation learning models (e.g., graph convolutional networks) does not always work. We theoretically analyze the generalization performance and propose a light-weight regularization term that avoids the high scales of node representations’ norms and the high variance among them to improve the generalization performance. Our experimental results further validate that this regularization term significantly improves the representation quality across different node similarity definitions and outperforms the state-of-the-art methods.

[1]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[2]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[4]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Jure Leskovec,et al.  Learning Structural Node Embeddings via Diffusion Wavelets , 2017, KDD.

[8]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[9]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[10]  Chang Zhou,et al.  Understanding Negative Sampling in Graph Representation Learning , 2020, KDD.

[11]  Jian Tang,et al.  vGraph: A Generative Model for Joint Community Detection and Node Representation Learning , 2019, NeurIPS.

[12]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[13]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[14]  James Cheng,et al.  Rethinking Graph Regularization For Graph Neural Networks , 2020, AAAI.

[15]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[16]  Chris Dyer,et al.  Notes on Noise Contrastive Estimation and Negative Sampling , 2014, ArXiv.

[17]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[18]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[19]  Jure Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2020, ICLR.

[20]  Yuxiao Dong,et al.  GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training , 2020, KDD.

[21]  M. Köppen,et al.  The Curse of Dimensionality , 2010 .

[22]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[23]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[24]  Yoshua Bengio,et al.  GMNN: Graph Markov Neural Networks , 2019, ICML.

[25]  Steven Skiena,et al.  HARP: Hierarchical Representation Learning for Networks , 2017, AAAI.

[26]  Xiaotong Zhang,et al.  Attributed Graph Clustering via Adaptive Graph Convolution , 2019, IJCAI.

[27]  Cordelia Schmid,et al.  What makes for good views for contrastive learning , 2020, NeurIPS.

[28]  Jie Zhang,et al.  Semi-supervised Learning on Graphs with Generative Adversarial Nets , 2018, CIKM.

[29]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[30]  Shaogang Gong,et al.  Unsupervised Deep Learning by Neighbourhood Discovery , 2019, ICML.

[31]  Andrea Vedaldi,et al.  Self-labelling via simultaneous clustering and representation learning , 2020, ICLR.

[32]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[33]  Lingfan Yu,et al.  Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. , 2019 .

[34]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[35]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[36]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[37]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[38]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[39]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[40]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[41]  Minnan Luo,et al.  Graph Representation Learning via Graphical Mutual Information Maximization , 2020, WWW.

[42]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[43]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[44]  Jun Zhu,et al.  Batch Virtual Adversarial Training for Graph Convolutional Networks , 2019, AI Open.

[45]  Tat-Seng Chua,et al.  Graph Adversarial Training: Dynamically Regularizing Based on Graph Structure , 2019, IEEE Transactions on Knowledge and Data Engineering.

[46]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[47]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[48]  Dahua Lin,et al.  Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination , 2018, ArXiv.

[49]  Lei Du,et al.  Robust Multi-View Spectral Clustering via Low-Rank and Sparse Decomposition , 2014, AAAI.

[50]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[51]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[52]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[53]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.