Accurate and Scalable Graph Neural Networks for Billion-Scale Graphs

Graph Neural Networks (GNNs) have been success-fully applied to a variety of graph analysis tasks. Some recent studies have demonstrated that decoupling neighbor aggregation and feature transformation helps to scale GNNs to large graphs. However, very large graphs, with billions of nodes and millions of features, are still beyond the capacity of most existing GNNs. In addition, when we are only interested in a small number of nodes (called target nodes) in a large graph, it is inefficient to use the existing GNNs to infer the labels of these few target nodes. The reason is that they need to propagate and aggregate either node features or predicted labels over the whole graph, which incurs high additional costs relative to the few target nodes. To solve the above challenges, in this paper we propose a novel scalable and effective GNN framework COSAL. In COSAL, we substitute the expensive aggregation with an efficient proximate node selection mechanism, which picks out the most important $K$ nodes for each target node according to the graph topology. We further propose a fine-grained neighbor importance quantification strategy to enhance the expressive power of COSAL. Empirical results demonstrate that our COSAL achieves superior performance in accuracy, training speed, and partial inference efficiency. Remarkably, in terms of node classification accuracy, our model COSAL outperforms baselines by significant margins of 2.22%, 2.23%, and 3.95% on large graph datasets Amazon2M, MAG-Scholar-C, and ogbn-papers100M, respectively.11 Code available at https://github.com/joyce-x/COSAL.

[1]  Jure Leskovec,et al.  GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings , 2021, ICML.

[2]  Ye Yuan,et al.  Approximate Graph Propagation , 2021, KDD.

[3]  Chuxiong Sun,et al.  Scalable and Adaptive Graph Neural Networks with Self-Label-Enhanced training , 2021, ArXiv.

[4]  Zhangyang Wang,et al.  A Unified Lottery Ticket Hypothesis for Graph Neural Networks , 2021, ICML.

[5]  Qian Huang,et al.  Combining Label Propagation and Simple Models Out-performs Graph Neural Networks , 2020, ICLR.

[6]  Xiangnan He,et al.  On the Equivalence of Decoupled Graph Convolution Network and Label Propagation , 2020, WWW.

[7]  Liwei Wang,et al.  GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training , 2020, ICML.

[8]  Yaliang Li,et al.  Scalable Graph Neural Networks via Bidirectional Propagation , 2020, NeurIPS.

[9]  Wei Chen,et al.  Co-purchaser Recommendation for Online Group Buying , 2020, Data Science and Engineering.

[10]  Kaigui Bian,et al.  GARG: Anonymous Recommendation of Point-of-Interest in Mobile Networks by Graph Convolution Network , 2020, Data Science and Engineering.

[11]  Shuiwang Ji,et al.  Towards Deeper Graph Neural Networks , 2020, KDD.

[12]  X. Guan,et al.  Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding , 2020, NeurIPS.

[13]  Xiangnan He,et al.  Disentangled Graph Collaborative Filtering , 2020, SIGIR.

[14]  Aleksandar Bojchevski,et al.  Scaling Graph Neural Networks with Approximate PageRank , 2020, KDD.

[15]  Donghyun Kim,et al.  Unsupervised Differentiable Multi-aspect Network Embedding , 2020, KDD.

[16]  Davide Eynard,et al.  SIGN: Scalable Inception Graph Neural Networks , 2020, ArXiv.

[17]  Yin Yang,et al.  Realtime index-free single source SimRank processing on web-scale graphs , 2020, Proc. VLDB Endow..

[18]  Yuxiao Dong,et al.  Microsoft Academic Graph: When experts are not enough , 2020, Quantitative Science Studies.

[19]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[20]  Yizhou Sun,et al.  Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks , 2019, NeurIPS.

[21]  Wenwu Zhu,et al.  Disentangled Graph Convolutional Networks , 2019, ICML.

[22]  Nuo Xu,et al.  MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions , 2019, IJCAI.

[23]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[24]  Yuichi Yoshida,et al.  Estimating Walk-Based Similarities Using Random Walk , 2019, WWW.

[25]  Yu Liu,et al.  PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs , 2019, SIGMOD Conference.

[26]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[27]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[28]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Junzhou Huang,et al.  Adaptive Sampling Towards Fast Graph Representation Learning , 2018, NeurIPS.

[30]  Ion Stoica,et al.  Tune: A Research Platform for Distributed Model Selection and Training , 2018, ArXiv.

[31]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[32]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[33]  Hao Ma,et al.  GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs , 2018, UAI.

[34]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[35]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[36]  Le Song,et al.  Stochastic Training of Graph Convolutional Networks with Variance Reduction , 2017, ICML.

[37]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[38]  Yu Liu,et al.  ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs , 2017, Proc. VLDB Endow..

[39]  Yin Yang,et al.  FORA: Simple and Effective Approximate Single-Source Personalized PageRank , 2017, KDD.

[40]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[41]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[42]  Yin Yang,et al.  HubPPR: Effective Indexing for Approximate Personalized PageRank , 2016, Proc. VLDB Endow..

[43]  Xiaokui Xiao,et al.  SLING: A Near-Optimal Index Structure for SimRank , 2016, SIGMOD Conference.

[44]  Ashish Goel,et al.  Personalized PageRank Estimation and Search: A Bidirectional Approach , 2015, WSDM.

[45]  Lixin Gao,et al.  Fast top-k path-based relevance query on massive graphs , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[46]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[47]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.