K-core decomposition on super large graphs with limited resources

K-core decomposition is a commonly used metric to analyze graph structure or study the relative importance of nodes in complex graphs. Recent years have seen rapid growth in the scale of the graph, especially in industrial settings. For example, our industrial partner runs popular social applications with billions of users and is able to gather a rich set of user data. As a result, applying K-core decomposition on large graphs has attracted more and more attention from academics and the industry. A simple but effective method to deal with large graphs is to train them in the distributed settings, and some distributed K-core decomposition algorithms are also proposed. Despite their effectiveness, we experimentally and theoretically observe that these algorithms consume too many resources and become unstable on super-large-scale graphs, especially when the given resources are limited. In this paper, we deal with those super-large-scale graphs and propose a divide-andconquer strategy on top of the distributed K-core decomposition algorithm. We evaluate our approach on three large graphs. The experimental results show that the consumption of resources can be significantly reduced, and the calculation on large-scale graphs becomes more stable than the existing methods. For example, the distributed K-core decomposition algorithm can scale to a large graph with 136 billion edges without losing correctness with our divide-and-conquer technique.

[1]  Dorothea Wagner,et al.  Augmenting k-core generation with preferential attachment , 2008, Networks Heterog. Media.

[2]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.

[3]  Joel H. Spencer,et al.  Sudden Emergence of a Giantk-Core in a Random Graph , 1996, J. Comb. Theory, Ser. B.

[4]  Bin Cui,et al.  PSGraph: How Tencent trains extremely large-scale graphs with Spark? , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[5]  Sabeur Aridhi,et al.  Distributed k-core decomposition and maintenance in large dynamic graphs , 2016, DEBS.

[6]  Yuval Shavitt,et al.  A model of Internet topology using k-shell decomposition , 2007, Proceedings of the National Academy of Sciences.

[7]  Tomasz Luczak,et al.  Size and connectivity of the k-core of a random graph , 1991, Discret. Math..

[8]  Yue Wang,et al.  Graph algorithms: parallelization and scalability , 2020, Science China Information Sciences.

[9]  Jiawei Jiang,et al.  OpenBox: A Generalized Black-box Optimization Service , 2021, KDD.

[10]  Marián Boguñá,et al.  Extracting the multiscale backbone of complex weighted networks , 2009, Proceedings of the National Academy of Sciences.

[11]  Sergiy Butenko,et al.  Clique Relaxations in Social Network Analysis: The Maximum k-Plex Problem , 2011, Oper. Res..

[12]  Vladimir Batagelj,et al.  Fast algorithms for determining (generalized) core groups in social networks , 2011, Adv. Data Anal. Classif..

[13]  Jie Jiang,et al.  Angel: a new large-scale machine learning system , 2018 .

[14]  Leandros Tassiulas,et al.  MapReduce-Based Distributed K-Shell Decomposition for Online Social Networks , 2014, 2014 IEEE World Congress on Services.

[15]  Evangelos E. Milios,et al.  Characterization of Graphs Using Degree Cores , 2007, WAW.

[16]  Shiwen Wu,et al.  Graph Neural Networks in Recommender Systems: A Survey , 2020, ArXiv.

[17]  Dimitrios M. Thilikos,et al.  Evaluating Cooperation in Communities with the k-Core Structure , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[18]  Yi Wang,et al.  Towards efficient allocation of graph convolutional networks on hybrid computation-in-memory architecture , 2021, Science China Information Sciences.

[19]  Dimitrios M. Thilikos,et al.  D-cores: measuring collaboration of directed graphs based on degeneracy , 2011, Knowledge and Information Systems.

[20]  Kumar Chellapilla,et al.  Finding Dense Subgraphs with Size Bounds , 2009, WAW.

[21]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[22]  Colin Cooper,et al.  The cores of random hypergraphs with a given degree sequence , 2004, Random Struct. Algorithms.

[23]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[24]  Francesco De Pellegrini,et al.  K-shell decomposition for dynamic complex networks , 2010, 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks.

[25]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[26]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[27]  Mohammad Al Hasan,et al.  A distributed k-core decomposition algorithm on spark , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[28]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[29]  Xupeng Miao,et al.  ROD: Reception-aware Online Distillation for Sparse Graphs , 2021, KDD.

[30]  Alex Thomo,et al.  How do biological networks differ from social networks? (an experimental study) , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[31]  Zhi Yang,et al.  Grain: Improving Data Efficiency of Graph Neural Networks via Diversified Influence Maximization , 2021, Proc. VLDB Endow..

[32]  Svante Janson,et al.  A simple solution to the k-core problem , 2007, Random Struct. Algorithms.

[33]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[34]  Wan-Shiou Yang,et al.  Discovering cohesive subgroups from social networks for targeted advertising , 2008, Expert Syst. Appl..

[35]  Lei Chen,et al.  Reliable Data Distillation on Graph Convolutional Network , 2020, SIGMOD Conference.

[36]  Jeffrey Xu Yu,et al.  Efficient Core Maintenance in Large Dynamic Graphs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[37]  Yi Pan,et al.  A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Sergey N. Dorogovtsev,et al.  K-core Organization of Complex Networks , 2005, Physical review letters.

[39]  Tao Zhou,et al.  The H-index of a network node and its relation to degree and coreness , 2016, Nature Communications.

[40]  Michael Molloy,et al.  Cores in random hypergraphs and Boolean formulas , 2005, Random Struct. Algorithms.

[41]  Silvio Lattanzi,et al.  Parallel and Streaming Algorithms for K-Core Decomposition , 2018, ICML.

[42]  Francesco De Pellegrini,et al.  General , 1895, The Social History of Alcohol Review.

[43]  Alessandro Vespignani,et al.  Large scale networks fingerprinting and visualization using the k-core decomposition , 2005, NIPS.

[44]  Lei Chen,et al.  ALG: Fast and Accurate Active Learning Framework for Graph Convolutional Networks , 2021, SIGMOD Conference.

[45]  Zhi Yang,et al.  RIM: Reliable Influence-based Active Learning on Graphs , 2021, NeurIPS.

[46]  Colin Perkins,et al.  Distributed k-core decomposition of dynamic graphs , 2012, CoNEXT Student '12.

[47]  Jeffrey Xu Yu,et al.  I/O efficient Core Graph Decomposition at web scale , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[48]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[49]  Kun-Lung Wu,et al.  Incremental k-core decomposition: algorithms and evaluation , 2016, The VLDB Journal.