NimbleCore: A space-efficient external memory algorithm for estimating core numbers

We address the problem of estimating core numbers of nodes by reading edges of a large graph stored in external memory. The core number of a node is the highest k-core in which the node participates. Core numbers are useful in many graph mining tasks, especially ones that involve finding communities of nodes, influential spreaders and dense subgraphs. Large graphs often do not fit on the memory of a single machine. Existing external memory solutions do not give bounds on the required space. In practice, existing solutions also do not scale with the size of the graph. We propose NimbleCore, an iterative external-memory algorithm, which estimates core numbers of nodes using O(n log dmax) space, where n is the number of nodes and dmax is the maximum node-degree in the graph. We also show that NimbleCore requires O(n) space for graphs with power-law degree distributions. Experiments on forty-eight large graphs from various domains demonstrate that NimbleCore gives space savings up to 60X, while accurately estimating core numbers with average relative error less than 2.3%.

[1]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[3]  Vladimir Batagelj,et al.  Fast algorithms for determining (generalized) core groups in social networks , 2011, Adv. Data Anal. Classif..

[4]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[5]  L. Moser,et al.  AN EXTREMAL PROBLEM IN GRAPH THEORY , 2001 .

[6]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[7]  P. Erdös On an extremal problem in graph theory , 1970 .

[8]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[9]  Dimitrios M. Thilikos,et al.  Evaluating Cooperation in Communities with the k-Core Structure , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[10]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[11]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[12]  Michael T. Goodrich,et al.  External-Memory Network Analysis Algorithms for Naturally Sparse Graphs , 2011, ESA.

[13]  Kang Zhang,et al.  FAÇADE: a fast and effective approach to the discovery of dense clusters in noisy spatial data , 2004, SIGMOD '04.

[14]  Blair D. Sullivan,et al.  Locally Estimating Core Numbers , 2014, 2014 IEEE International Conference on Data Mining.

[15]  Francesco De Pellegrini,et al.  Distributed k-Core Decomposition , 2013 .

[16]  Jon M. Kleinberg,et al.  Graph cluster randomization: network exposure to multiple universes , 2013, KDD.

[17]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[18]  Stefan Wuchty,et al.  Peeling the yeast protein network , 2005, Proteomics.

[19]  Tamara G. Kolda,et al.  Accelerating Community Detection by Using K-core Subgraphs , 2014, ArXiv.

[20]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[21]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[22]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.