Real-Time Graph Partition and Embedding of Large Network

Recently, large-scale networks attract significant attention to analyze and extract the hidden information of big data. Toward this end, graph embedding is a method to embed a high dimensional graph into a much lower dimensional vector space while maximally preserving the structural information of the original network. However, effective graph embedding is particularly challenging when massive graph data are generated and processed for real-time applications. In this paper, we address this challenge and propose a new real-time and distributed graph embedding algorithm (RTDGE) that is capable of distributively embedding a large-scale graph in a streaming fashion. Specifically, our RTDGE consists of the following components: (1) a graph partition scheme that divides all edges into distinct subgraphs, where vertices are associated with edges and may belong to several subgraphs; (2) a dynamic negative sampling (DNS) method that updates the embedded vectors in real-time; and (3) an unsupervised global aggregation scheme that combines all locally embedded vectors into a global vector space. Furthermore, we also build a real-time distributed graph embedding platform based on Apache Kafka and Apache Storm. Extensive experimental results show that RTDGE outperforms existing solutions in terms of graph embedding efficiency and accuracy.

[1]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[2]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Huan Liu,et al.  Leveraging social media networks for classification , 2011, Data Mining and Knowledge Discovery.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[7]  Shi Feng,et al.  Localized matrix factorization for recommendation based on matrix block diagonal forms , 2013, WWW.

[8]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[11]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[12]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[13]  Alberto Montresor,et al.  Distributed Edge Partitioning for Graph Processing , 2014, ArXiv.

[14]  Sherif Sakr,et al.  Large-Scale Graph Processing Using Apache Giraph , 2017, Springer International Publishing.

[15]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[16]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[17]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[18]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[19]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.