Implicit SVD for Graph Representation Learning

Recent improvements in the performance of state-of-the-art (SOTA) methods for Graph Representational Learning (GRL) have come at the cost of significant computational resource requirements for training, e.g., for calculating gradients via backprop over many data epochs. Meanwhile, Singular Value Decomposition (SVD) can find closed-form solutions to convex problems, using merely a handful of epochs. In this paper, we make GRL more computationally tractable for those with modest hardware. We design a framework that computes SVD of implicitly defined matrices, and apply this framework to several GRL tasks. For each task, we derive linear approximation of a SOTA model, where we design (expensiveto-store) matrix M and train the model, in closed-form, via SVD of M, without calculating entries of M. By converging to a unique point in one step, and without calculating gradients, our models show competitive empirical test performance over various graphs such as article citation and biological interaction networks. More importantly, SVD can initialize a deeper model, that is architected to be nonlinear almost everywhere, though behaves linearly when its parameters reside on a hyperplane, onto which SVD initializes. The deeper model can then be fine-tuned within only a few epochs. Overall, our procedure trains hundreds of times faster than state-of-the-art methods, while competing on empirical test performance. We open-source our implementation at: https://github.com/samihaija/isvd

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[4]  Sami Abu-El-Haija,et al.  Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning , 2021, ICLR.

[5]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[6]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[7]  Aiguo Chen,et al.  Memory-Associated Differential Learning , 2021, ArXiv.

[8]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[9]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[10]  Yangkun Wang,et al.  Bag of Tricks of Semi-Supervised Classification with Graph Neural Networks , 2021, ArXiv.

[11]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[12]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[13]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[14]  Petros Drineas,et al.  TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes , 2019, Bioinform..

[15]  Kesheng Wu,et al.  Thick-Restart Lanczos Method for Large Symmetric Eigenvalue Problems , 2000, SIAM J. Matrix Anal. Appl..

[16]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[17]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[18]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[19]  D. Calvetti,et al.  AN IMPLICITLY RESTARTED LANCZOS METHOD FOR LARGE SYMMETRIC EIGENVALUE PROBLEMS , 1994 .

[20]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[23]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[24]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[26]  F. Molteni,et al.  The ECMWF Ensemble Prediction System: Methodology and validation , 1996 .

[27]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[28]  Kristina Lerman,et al.  MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing , 2019, ICML.

[29]  Jure Leskovec,et al.  Unifying Graph Convolutional Neural Networks and Label Propagation , 2020, ArXiv.

[30]  Alexander A. Alemi,et al.  Watch Your Step: Learning Node Embeddings via Graph Attention , 2017, NeurIPS.

[31]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[32]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[34]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[35]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[36]  Chuxiong Sun,et al.  Adaptive Graph Diffusion Networks with Hop-wise Attention , 2020, ArXiv.

[37]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[38]  Qian Huang,et al.  Combining Label Propagation and Simple Models Out-performs Graph Neural Networks , 2020, ICLR.

[39]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[40]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[41]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[42]  Uday Bondhugula,et al.  MLIR: Scaling Compiler Infrastructure for Domain Specific Computation , 2021, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).