K-Core Decomposition of Large Networks on a Single PC

Studying the topology of a network is critical to inferring underlying dynamics such as tolerance to failure, group behavior and spreading patterns. k-core decomposition is a well-established metric which partitions a graph into layers from external to more central vertices. In this paper we aim to explore whether k-core decomposition of large networks can be computed using a consumer-grade PC. We feature implementations of the "vertex-centric" distributed protocol introduced by Montresor, De Pellegrini and Miorandi on GraphChi and Webgraph. Also, we present an accurate implementation of the Batagelj and Zaversnik algorithm for k-core decomposition in Webgraph. With our implementations, we show that we can efficiently handle networks of billions of edges using a single consumer-level machine within reasonable time and can produce excellent approximations in only a fraction of the execution time. To the best of our knowledge, our biggest graphs are considerably larger than the graphs considered in the literature. Next, we present an optimized implementation of an external-memory algorithm (EMcore) by Cheng, Ke, Chu, and Ozsu. We show that this algorithm also performs well for large datasets, however, it cannot predict whether a given memory budget is sufficient for a new dataset. We present a thorough analysis of all algorithms concluding that it is viable to compute k-core decomposition for large networks in a consumer-grade PC.

[1]  Francesco Bonchi,et al.  Core decomposition of uncertain graphs , 2014, KDD.

[2]  Wan-Shiou Yang,et al.  Discovering cohesive subgroups from social networks for targeted advertising , 2008, Expert Syst. Appl..

[3]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[5]  David F. Gleich,et al.  Algorithms and Models for the Web Graph , 2014, Lecture Notes in Computer Science.

[6]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[7]  Alessandro Vespignani,et al.  K-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases , 2005, Networks Heterog. Media.

[8]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[9]  Tim Roughgarden,et al.  Preventing Unraveling in Social Networks: The Anchored k-Core Problem , 2012, SIAM J. Discret. Math..

[10]  Marián Boguñá,et al.  Extracting the multiscale backbone of complex weighted networks , 2009, Proceedings of the National Academy of Sciences.

[11]  Alessandro Vespignani,et al.  Large scale networks fingerprinting and visualization using the k-core decomposition , 2005, NIPS.

[12]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[13]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[14]  Lars Backstrom,et al.  Structural diversity in social contagion , 2012, Proceedings of the National Academy of Sciences.

[15]  Alex Thomo,et al.  Trust prediction from user-item ratings , 2013, Social Network Analysis and Mining.

[16]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[17]  Daniela E. Damian,et al.  Mining Task-Based Social Networks to Explore Collaboration in Software Teams , 2009, IEEE Software.

[18]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[19]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.

[20]  Albert-László Barabási,et al.  Linked: The New Science of Networks , 2002 .

[21]  Evangelos E. Milios,et al.  Characterization of Graphs Using Degree Cores , 2007, WAW.

[22]  Sergiy Butenko,et al.  On clique relaxation models in network analysis , 2013, Eur. J. Oper. Res..

[23]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[24]  Sergey N. Dorogovtsev,et al.  k-core (bootstrap) percolation on complex networks: Critical phenomena and nonlocal effects , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Filippo Menczer,et al.  Virality Prediction and Community Structure in Social Networks , 2013, Scientific Reports.

[26]  Mikkel Thorup,et al.  String hashing for linear probing , 2009, SODA.

[27]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[28]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[29]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[30]  Sergey N. Dorogovtsev,et al.  K-core Organization of Complex Networks , 2005, Physical review letters.

[31]  Yiannis Kompatsiaris,et al.  Community detection in Social Media , 2012, Data Mining and Knowledge Discovery.

[32]  Francesco De Pellegrini,et al.  General , 1895, The Social History of Alcohol Review.

[33]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[34]  Yi Pan,et al.  A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Alex Thomo,et al.  How do biological networks differ from social networks? (an experimental study) , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[36]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[37]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .