Scaling Fine-grained Modularity Clustering for Massive Graphs

Modularity clustering is an essential tool to understand complicated graphs. However, existing methods are not applicable to massive graphs due to two serious weaknesses. (1) It is difficult to fully reproduce ground-truth clusters due to the resolution limit problem. (2) They are computationally expensive because all nodes and edges must be computed iteratively. This paper proposes gScarf, which outputs fine-grained clusters within a short running time. To overcome the aforementioned weaknesses, gScarf dynamically prunes unnecessary nodes and edges, ensuring that it captures fine-grained clusters. Experiments show that gScarf outperforms existing methods in terms of running time while finding clusters with high accuracy.

[1]  Yasuhiro Fujiwara,et al.  Fast Algorithm for Modularity-Based Graph Clustering , 2013, AAAI.

[2]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[3]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[4]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[5]  Jakub W. Pachocki,et al.  Scalable Motif-aware Graph Clustering , 2016, WWW.

[6]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[7]  Physical Review , 1965, Nature.

[8]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Luonan Chen,et al.  Quantitative function for community detection. , 2008 .

[10]  Hiroyuki Kitagawa,et al.  ScaleSCAN: Scalable Density-Based Graph Clustering , 2018, DEXA.

[11]  Hiroyuki Kitagawa,et al.  FORank: Fast ObjectRank for Large Heterogeneous Graphs , 2018, WWW.

[12]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[13]  Lu Qin,et al.  pSCAN: Fast and exact structural graph clustering , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[14]  Yasuhiro Fujiwara,et al.  SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs , 2015, Proc. VLDB Endow..

[15]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[16]  Balaraman Ravindran,et al.  CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large Networks , 2015, IJCAI.

[17]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[18]  Hiroyuki Kitagawa,et al.  SCAN-XP: Parallel Structural Graph Clustering Algorithm on Intel Xeon Phi Coprocessors , 2017, NDA@SIGMOD.

[19]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[20]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[21]  M. Mézard,et al.  Journal of Statistical Mechanics: Theory and Experiment , 2011 .

[22]  V. Carchiolo,et al.  Extending the definition of modularity to directed graphs with overlapping communities , 2008, 0801.1647.

[23]  Alireza Louni,et al.  Who Spread That Rumor: Finding the Source of Information in Large Online Social Networks With Probabilistically Varying Internode Relationship Strengths , 2018, IEEE Transactions on Computational Social Systems.

[24]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Yanchi Liu,et al.  Community detection in graphs through correlation , 2014, KDD.

[26]  William IEEE TRANSACTIONS ON INFORMATION THEORY VOL XX NO Y MONTH Signal Propagation and Noisy Circuits , 2019 .

[27]  Alberto Costa Comment on "Quantitative function for community detection" , 2014, ArXiv.

[28]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[29]  F. Rao,et al.  Local modularity measure for network clusterizations. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[31]  Ming Li,et al.  Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..