GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) --- a self-supervised graph neural network pre-training framework --- to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.

[1]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[2]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[3]  Yixin Chen,et al.  An End-to-End Deep Learning Architecture for Graph Classification , 2018, AAAI.

[4]  Martin G. Everett,et al.  Models of core/periphery structures , 2000, Soc. Networks.

[5]  Alessandro Vespignani,et al.  Large scale networks fingerprinting and visualization using the k-core decomposition , 2005, NIPS.

[6]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[7]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[8]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[10]  Lingfan Yu,et al.  Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. , 2019 .

[11]  S. Shen-Orr,et al.  Superfamilies of Evolved and Designed Networks , 2004, Science.

[12]  Gad Abraham,et al.  A Scalable Permutation Approach Reveals Replication and Preservation Patterns of Network Modules in Large Datasets. , 2016, Cell systems.

[13]  Yang Liu,et al.  graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[14]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[18]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[19]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[20]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[21]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Yan Wang,et al.  ProNE: Fast and Scalable Network Representation Learning , 2019, IJCAI.

[23]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[24]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[25]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[26]  Yuxiao Dong,et al.  DeepInf: Social Influence Prediction with Deep Learning , 2018, KDD.

[27]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[28]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[29]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Rui Li,et al.  OAG: Toward Linking Large-scale Heterogeneous Entity Graphs , 2019, KDD.

[33]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[34]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[36]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[37]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[38]  Silvio Micali,et al.  Reconstructing Markov processes from independent and anonymous experiments , 2016, Discret. Appl. Math..

[39]  Shang-Hua Teng,et al.  Scalable Algorithms for Data and Network Analysis , 2016, Found. Trends Theor. Comput. Sci..

[40]  Jure Leskovec,et al.  Pre-training Graph Neural Networks , 2019, ArXiv.

[41]  Yilun Jin,et al.  GraLSP: Graph Neural Networks with Local Structural Patterns , 2019, AAAI.

[42]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[43]  Alexander J. Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[44]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[45]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[46]  Hanghang Tong,et al.  Panther: Fast Top-k Similarity Search on Large Networks , 2015, KDD.

[47]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[48]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[49]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[50]  Jure Leskovec,et al.  Learning Structural Node Embeddings via Diffusion Wavelets , 2017, KDD.

[51]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[52]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[53]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[54]  Lars Backstrom,et al.  Structural diversity in social contagion , 2012, Proceedings of the National Academy of Sciences.

[55]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[57]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[58]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[59]  Jian Tang,et al.  InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization , 2019, ICLR.

[60]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[61]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[62]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[63]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[64]  Jian Li,et al.  NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization , 2019, WWW.

[65]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[66]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.