VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the “neighbor explosion” problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the “neighbor explosion” problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks.

[1]  Nicholas D. Lane,et al.  Degree-Quant: Quantization-Aware Training for Graph Neural Networks , 2021, ICLR.

[2]  Hongxu Chen,et al.  Is Attention Better Than Matrix Decomposition? , 2021, ICLR.

[3]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[4]  Le Song,et al.  Stochastic Training of Graph Convolutional Networks with Variance Reduction , 2017, ICML.

[5]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[6]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[7]  Alexei Baevski,et al.  vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.

[8]  Jiawei Zhang,et al.  Graph-Bert: Only Attention is Needed for Learning Graph Representations , 2020, ArXiv.

[10]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[11]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[12]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[13]  Davide Eynard,et al.  SIGN: Scalable Inception Graph Neural Networks , 2020, ArXiv.

[14]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[15]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[16]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[17]  Donald F. Towsley,et al.  Diffusion-Convolutional Neural Networks , 2015, NIPS.

[18]  David Berthelot,et al.  Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer , 2018, ICLR.

[19]  Hanwei Wu,et al.  Learning Product Codebooks Using Vector-Quantized Autoencoders for Image Retrieval , 2019, 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[20]  Ole Winther,et al.  BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling , 2019, NeurIPS.

[21]  Lorenzo Livi,et al.  Graph Neural Networks With Convolutional ARMA Filters , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[23]  Junzhou Huang,et al.  Adaptive Sampling Towards Fast Graph Representation Learning , 2018, NeurIPS.

[24]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[25]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[26]  Alexei Baevski,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[27]  Yizhou Sun,et al.  Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks , 2019, NeurIPS.

[28]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[29]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[30]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[31]  Yatao Bian,et al.  Self-Supervised Graph Transformer on Large-Scale Molecular Data , 2020, NeurIPS.

[32]  Andreas Loukas,et al.  What graph neural networks cannot learn: depth vs width , 2019, ICLR.

[33]  Johannes Klicpera,et al.  Scaling Graph Neural Networks with Approximate PageRank , 2020, KDD.

[34]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[35]  George Dasoulas,et al.  Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks , 2021, ICML.

[36]  Xu Li,et al.  SGQuant: Squeezing the Last Bit on Graph Neural Networks with Specialized Quantization , 2020, 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI).

[37]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[38]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[39]  Paul Honeine,et al.  Analyzing the Expressive Power of Graph Neural Networks in a Spectral Perspective , 2021, ICLR.

[40]  Stephan Günnemann,et al.  Diffusion Improves Graph Learning , 2019, NeurIPS.

[41]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[42]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[43]  Ken-ichi Kawarabayashi,et al.  What Can Neural Networks Reason About? , 2019, ICLR.

[44]  Pietro Liò,et al.  Principal Neighbourhood Aggregation for Graph Nets , 2020, NeurIPS.

[45]  Xavier Bresson,et al.  CayleyNets: Graph Convolutional Neural Networks With Complex Rational Spectral Filters , 2017, IEEE Transactions on Signal Processing.

[46]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .