Streaming Algorithms for k-core Decomposition

A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-Hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for streaming graph data. In this paper, we propose the first incremental k-core decomposition algorithms for streaming graph data. These algorithms locate a small subgraph that is guaranteed to contain the list of vertices whose maximum k-core values have to be updated, and efficiently process this subgraph to update the k-core decomposition. Our results show a significant reduction in run-time compared to non-incremental alternatives. We show the efficiency of our algorithms on different types of real and synthetic graphs, at different scales. For a graph of 16 million vertices, we observe speedups reaching a million times, relative to the non-incremental algorithms.

[1]  Evangelos E. Milios,et al.  Characterization of Graphs Using Degree Cores , 2007, WAW.

[2]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[3]  Dimitrios M. Thilikos,et al.  D-cores: measuring collaboration of directed graphs based on degeneracy , 2011, Knowledge and Information Systems.

[4]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[5]  Sergey N. Dorogovtsev,et al.  K-core Organization of Complex Networks , 2005, Physical review letters.

[6]  Alessandro Vespignani,et al.  K-core Decomposition: a Tool for the Visualization of Large Scale Networks , 2005, ArXiv.

[7]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[8]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[9]  Kazuyuki Aihara,et al.  A large-scale study of link spam detection by graph algorithms , 2007, AIRWeb '07.

[10]  Guy Kortsarz,et al.  Generating Sparse 2-Spanners , 1992, J. Algorithms.

[11]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Srinivasan Parthasarathy,et al.  Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[14]  Sougata Mukherjea,et al.  On the structural properties of massive telecom call graphs: findings and implications , 2006, CIKM '06.

[15]  Kumar Chellapilla,et al.  Finding Dense Subgraphs with Size Bounds , 2009, WAW.

[16]  Anurag Verma,et al.  Network clustering via clique relaxations: A community based approach , 2012, Graph Partitioning and Graph Clustering.

[17]  Dorothea Wagner,et al.  Augmenting k-core generation with preferential attachment , 2008, Networks Heterog. Media.

[18]  Stefan Wuchty,et al.  Peeling the yeast protein network , 2005, Proteomics.

[19]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[20]  Tomasz Łuczak,et al.  Size and connectivity of the k-core of a random graph , 1991 .

[21]  R Samudrala,et al.  A graph-theoretic algorithm for comparative modeling of protein structure. , 1998, Journal of molecular biology.

[22]  Andrei Z. Broder,et al.  Algorithms and Models for the Web-Graph, Fourth International Workshop, WAW 2006, Banff, Canada, November 30 - December 1, 2006. Revised Papers , 2008, WAW.

[23]  Marco Pellegrini,et al.  Extraction and classification of dense communities in the web , 2007, WWW '07.

[24]  Deepak S. Turaga,et al.  Design principles for developing stream processing applications , 2010, Softw. Pract. Exp..

[25]  Jeffrey Xu Yu,et al.  Efficient Core Maintenance in Large Dynamic Graphs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[26]  Tomasz Luczak,et al.  Size and connectivity of the k-core of a random graph , 1991, Discret. Math..

[27]  Dimitrios M. Thilikos,et al.  Evaluating Cooperation in Communities with the k-Core Structure , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[28]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[29]  Sergiy Butenko,et al.  Clique Relaxations in Social Network Analysis: The Maximum k-Plex Problem , 2011, Oper. Res..

[30]  B. Bollobás The evolution of random graphs , 1984 .

[31]  Maurizio Patrignani,et al.  Dynamic Analysis of the Autonomous System Graph , 2004 .

[32]  Fatih Özgül,et al.  Comparison of Feature-Based Criminal Network Detection Models with k-Core and n-Clique , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.