A distributed k-core decomposition algorithm on spark

k-core decomposition of a graph is a popular graph analysis method that has found widespread applications in various tasks. Thanks to its linear time complexity, k-core decomposition method is scalable to large real-life networks as long as the input graph fits in the main memory. For graphs that do not fit in the main memory, external memory based approach or distributed solution based on iterative MapReduce platform have been proposed. However, both external memory solution and iterative MapReduce based solution are slow due to their high disk I/O cost. In this paper we propose, Spark-kCore, a distributed k-core decomposition algorithm, which runs on Spark cluster computing platform. Using think-like-a-vertex paradigm, the proposed method utilizes a message passing paradigm for solving k-core decomposition, thus reducing the I/O cost substantially. Experiments on 15 large real-life networks show that our method is much faster than the existing k-core decomposition solutions.

[1]  Yuval Shavitt,et al.  A model of Internet topology using k-shell decomposition , 2007, Proceedings of the National Academy of Sciences.

[2]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[3]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[4]  D. R. Lick,et al.  k-Degenerate Graphs , 1970, Canadian Journal of Mathematics.

[5]  Alberto Montresor,et al.  An evaluation study of BigData frameworks for graph processing , 2013, 2013 IEEE International Conference on Big Data.

[6]  Francesco De Pellegrini,et al.  Distributed k-Core Decomposition , 2013 .

[7]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[8]  Mohammad Al Hasan,et al.  GUISE: a uniform sampler for constructing frequency histogram of graphlets , 2013, Knowledge and Information Systems.

[9]  Alessandro Vespignani,et al.  K-core Decomposition: a Tool for the Visualization of Large Scale Networks , 2005, ArXiv.

[10]  O. Sporns,et al.  Mapping the Structural Core of Human Cerebral Cortex , 2008, PLoS biology.

[11]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[12]  Ryan A. Rossi,et al.  Parallel Maximum Clique Algorithms with Applications to Network Analysis , 2013, SIAM J. Sci. Comput..

[13]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[14]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[15]  Alessandro Vespignani,et al.  K-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases , 2005, Networks Heterog. Media.

[16]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.

[17]  Sergiy Butenko,et al.  On the maximum quasi-clique problem , 2013, Discret. Appl. Math..

[18]  Mohammad Al Hasan,et al.  Finding Network Motifs Using MCMC Sampling , 2015, CompleNet.