论文信息 - FREDE: Linear-Space Anytime Graph Embeddings

FREDE: Linear-Space Anytime Graph Embeddings

Low-dimensional representations, or embeddings, of a graph's nodes facilitate data mining tasks. Known embedding methods explicitly or implicitly rely on a similarity measure among nodes. As the similarity matrix is quadratic, a tradeoff between space complexity and embedding quality arises; past research initially opted for heuristics and linear-transform factorizations, which allow for linear space but compromise on quality; recent research has proposed a quadratic-space solution as a viable option too. In this paper we observe that embedding methods effectively aim to preserve the covariance among the rows of a similarity matrix, and raise the question: is there a method that combines (i) linear space complexity, (ii) a nonlinear transform as its basis, and (iii) nontrivial quality guarantees? We answer this question in the affirmative, with FREDE(FREquent Directions Embedding), a sketching-based method that iteratively improves on quality while processing rows of the similarity matrix individually; thereby, it provides, at any iteration, column-covariance approximation guarantees that are, in due course, almost indistinguishable from those of the optimal row-covariance approximation by SVD. Our experimental evaluation on variably sized networks shows that FREDE performs as well as SVD and competitively against current state-of-the-art methods in diverse data mining tasks, even when it derives an embedding based on only 10% of node similarities.

[1] Mike Tyers,et al. BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[2] Christos Boutsidis,et al. Near Optimal Column-Based Matrix Reconstruction , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[3] David P. Woodruff,et al. Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[4] Jian Li,et al. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[5] Zi Yin,et al. On the Dimensionality of Word Embedding , 2018, NeurIPS.

[6] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.

[7] Omer Levy,et al. Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[8] Santosh S. Vempala,et al. The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[9] Shlomo Zilberstein,et al. Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[10] Mikkel Thorup,et al. Approximate distance oracles , 2005, J. ACM.

[11] David P. Woodruff,et al. Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..

[12] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13] Emmanuel Müller,et al. VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[14] David P. Woodruff. Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[15] Bryan Perozzi,et al. Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs , 2020, WWW.

[16] Edo Liberty,et al. Simple and deterministic matrix sketching , 2012, KDD.

[17] Jian Pei,et al. Arbitrary-Order Proximity Preserved Network Embedding , 2018, KDD.

[18] Qiongkai Xu,et al. GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[19] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20] Mingzhe Wang,et al. LINE: Large-scale Information Network Embedding , 2015, WWW.

[21] Huan Liu,et al. Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[22] Kenneth Ward Church,et al. Very sparse random projections , 2006, KDD '06.

[23] Jian Pei,et al. Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[24] Emmanuel Müller,et al. The Shape of Data: Intrinsic Distance for Data Distributions , 2020, ICLR.

[25] Ashish Goel,et al. Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[26] Jian Li,et al. NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization , 2019, WWW.

[27] Karl Stratos,et al. Model-based Word Embeddings from Decompositions of Count Matrices , 2015, ACL.

[28] Omer Levy,et al. Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[29] Christos Boutsidis,et al. An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[30] Xiao Wang,et al. Billion-Scale Network Embedding with Iterative Random Projection , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[31] Jure Leskovec,et al. node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[32] Jayadev Misra,et al. Finding Repeated Elements , 1982, Sci. Comput. Program..

[33] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.