Parallel Community Detection on Massive Graphs

Community detection groups a network into node sets according to their connections. It is an effective way to understanding and analyzing graph-structured data, such as social networks, collaboration networks, and bioinformatic networks. With the flourishing development of social network applications, it has become more desirable to explore graphs from a community-level view. However, based on sequential algorithms, most existing community detection methods are not suitable for massive graphs. In this paper, we propose a Parallel Community Detection approach, named ParCoDe. Just like the native sequential algorithm, it uses "community modularity" as the metric. The detecting process starts from each single node and performs in a bottom-up way. In order to improve its performance, we propose an approximate solution to accelerate the speed of detection with little loss of accuracy. We have implemented ParCoDe on Giraph. Comprehensive experiments on both real and synthetic datasets demonstrate that ParCoDe is of well scalability and is efficient for community detection.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  Jae-Gil Lee,et al.  Scalable community detection from networks by computing edge betweenness on MapReduce , 2014, 2014 International Conference on Big Data and Smart Computing (BIGCOMP).

[3]  Natali Gulbahce,et al.  The art of community detection , 2008, BioEssays : news and reviews in molecular, cellular and developmental biology.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[6]  Michael T. Goodrich,et al.  A bridging model for parallel computation, communication, and I/O , 1996, CSUR.

[7]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[8]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[9]  Soraj Hongladarom Personal Identity and the Self in the Online and Offline World , 2011, Minds and Machines.

[10]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Christian Staudt,et al.  Engineering Parallel Algorithms for Community Detection in Massive Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[13]  Jari Saramäki,et al.  Characterizing the Community Structure of Complex Networks , 2010, PloS one.

[14]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.