InstantEmbedding: Efficient Local Node Representations

In this paper, we introduce InstantEmbedding, an efficient method for generating single-node representations using local PageRank computations. We theoretically prove that our approach produces globally consistent representations in sublinear time. We demonstrate this empirically by conducting extensive experiments on real-world datasets with over a billion edges. Our experiments confirm that InstantEmbedding requires drastically less computation time (over 9,000 times faster) and less memory (by over 8,000 times) to produce a single node's embedding than traditional methods including DeepWalk, node2vec, VERSE, and FastRP. We also show that our method produces high quality representations, demonstrating results that meet or exceed the state of the art for unsupervised representation learning on tasks like node classification and link prediction.

[1]  Pan Peng,et al.  Testing Cluster Structure of Graphs , 2015, STOC.

[2]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[3]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[4]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[5]  Fan Chung Graham,et al.  Using PageRank to Locally Partition a Graph , 2007, Internet Math..

[6]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[7]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[8]  Emmanuel Müller,et al.  VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Jukka Suomela,et al.  Survey of local algorithms , 2013, CSUR.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Karl Aberer,et al.  On Node Features for Graph Neural Networks , 2019, ArXiv.

[13]  Ashish Goel,et al.  FAST-PPR: scaling personalized pagerank estimation for large graphs , 2014, KDD.

[14]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[15]  Artur Czumaj,et al.  Testing Expansion in Bounded-Degree Graphs , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[16]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[17]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[18]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[19]  Steven Skiena,et al.  Fast and Accurate Network Embeddings via Very Sparse Random Projection , 2019, CIKM.

[20]  Steven Skiena,et al.  A Tutorial on Network Embeddings , 2018, ArXiv.

[21]  Emmanuel Müller,et al.  FREDE: Linear-Space Anytime Graph Embeddings , 2020, ArXiv.

[22]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[26]  Xiao Wang,et al.  Billion-Scale Network Embedding with Iterative Random Projection , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[27]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[28]  Yongdong Zhang,et al.  LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation , 2020, SIGIR.