NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix-which is dense-is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available1.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[5]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[6]  Richard Peng,et al.  Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[7]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[8]  N. Higham,et al.  On pth Roots of Stochastic Matrices , 2011 .

[9]  Erik Ordentlich,et al.  Network-Efficient Distributed Word2vec Training System for Large Vocabularies , 2016, CIKM.

[10]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[11]  Anirban Dasgupta,et al.  Spectral analysis of random graphs with skewed degree distributions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[12]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[13]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[14]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[15]  Huan Liu,et al.  A Social Identity Approach to Identify Familiar Strangers in a Social Network , 2009, ICWSM.

[16]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[17]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[18]  Daniele Calandriello,et al.  Improved Large-Scale Graph Learning through Ridge Spectral Sparsification , 2018, ICML.

[19]  Yu Cheng,et al.  Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification , 2015, COLT.

[20]  Stergios Stergiou,et al.  Distributed Negative Sampling for Word Embeddings , 2017, AAAI.

[21]  Yin Tat Lee,et al.  Single Pass Spectral Sparsification in Dynamic Streams , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[22]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[25]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[26]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[27]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[28]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[29]  Emmanuel Müller,et al.  VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[30]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[31]  Dik Lun Lee,et al.  Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba , 2018, KDD.

[32]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[33]  N. Higham,et al.  On pth Roots of Stochastic Matrices , 2011 .

[34]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[35]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[36]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[37]  Pradeep Dubey,et al.  Parallelizing Word2Vec in Shared and Distributed Memory , 2016, IEEE Transactions on Parallel and Distributed Systems.

[38]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[39]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[40]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[41]  Peixiang Zhao,et al.  gSparsify: Graph Motif Based Sparsification for Graph Clustering , 2015, CIKM.

[42]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[43]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[44]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[45]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[46]  Yu Cheng,et al.  Spectral Sparsification of Random-Walk Matrix Polynomials , 2015, ArXiv.

[47]  Shang-Hua Teng,et al.  Scalable Algorithms for Data and Network Analysis , 2016, Found. Trends Theor. Comput. Sci..

[48]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[49]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .