Fast Community Detection in Large Weighted Networks Using GraphX in the Cloud

Identification of different communities in large weighted networks is of crucial importance since it helps to uncover priori unknown functional modules such as topics in information networks or cyber-communities in social networks. However, the typical size of networks, such as social network services or World Wide Web, now counts in millions of nodes and is computationally complex. This urgently demands feasible methods and available computing platforms to retrieve their structure efficiently. To address this problem, we propose an algorithm Fast Community Detection (FastCD) based on modularity optimization. Furthermore, FastCD easily supports parallel computation. We implement FastCD with GraphX, which is an embedded graph processing framework built on top of Apache Spark. After carrying out comprehensive experiments in a 16-nodes cluster (32 vCPU) on Amazon EC2, the results indicate that FastCD not only outperforms the state-of-the-art algorithms in terms of computation time, but also guarantees the accuracy of the solutions under different real-world networks commonly used for efficiency comparison.

[1]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[2]  Wei Chen,et al.  A game-theoretic framework to identify overlapping communities in social networks , 2010, Data Mining and Knowledge Discovery.

[3]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[5]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[6]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[8]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[10]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[11]  Fabricio A. Breve,et al.  Uncovering Overlap Community Structure in Complex Networks Using Particle Competition , 2009, AICI.

[12]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[13]  Tamara G. Kolda,et al.  Accelerating Community Detection by Using K-core Subgraphs , 2014, ArXiv.

[14]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[16]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[18]  Boleslaw K. Szymanski,et al.  Towards Linear Time Overlapping Community Detection in Social Networks , 2012, PAKDD.

[19]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[20]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[21]  Mark E. J. Newman,et al.  Structure and Dynamics of Networks , 2009 .

[22]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[23]  Evgenios M. Kornaropoulos,et al.  Fast approximation of betweenness centrality through sampling , 2014, Data Mining and Knowledge Discovery.

[24]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.