Community detection in graphs through correlation

Community detection is an important task for social networks, which helps us understand the functional modules on the whole network. Among different community detection methods based on graph structures, modularity-based methods are very popular recently, but suffer a well-known resolution limit problem. This paper connects modularity-based methods with correlation analysis by subtly reformatting their math formulas and investigates how to fully make use of correlation analysis to change the objective function of modularity-based methods, which provides a more natural and effective way to solve the resolution limit problem. In addition, a novel theoretical analysis on the upper bound of different objective functions helps us understand their bias to different community sizes, and experiments are conducted on both real life and simulated data to validate our findings.

[1]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[2]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[3]  Mohammed J. Zaki,et al.  Is There a Best Quality Metric for Graph Clusters? , 2011, ECML/PKDD.

[4]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  J. Doye,et al.  Identifying communities within energy landscapes. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[7]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[8]  Ke Hu,et al.  Multi-resolution modularity methods and their limitations in community detection , 2012, The European Physical Journal B.

[9]  Christophe G. Giraud-Carrier,et al.  Behavior-based clustering and analysis of interestingness measures for association rule mining , 2014, Data Mining and Knowledge Discovery.

[10]  V. Carchiolo,et al.  Extending the definition of modularity to directed graphs with overlapping communities , 2008, 0801.1647.

[11]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[12]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Yasuhiro Fujiwara,et al.  Fast Algorithm for Modularity-Based Graph Clustering , 2013, AAAI.

[16]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[17]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[18]  Alex Arenas,et al.  Analysis of the structure of complex networks at different resolution levels , 2007, physics/0703218.

[19]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[20]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[21]  Yanchi Liu,et al.  Selecting the Right Correlation Measure for Binary Data , 2014, TKDD.

[22]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[23]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Clara Pizzuti,et al.  Community detection in social networks with genetic algorithms , 2008, GECCO '08.

[25]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[26]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[27]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[28]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[29]  William Nick Street,et al.  Finding Maximal Fully-Correlated Itemsets in Large Databases , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[30]  Mark E. J. Newman,et al.  Community detection and graph partitioning , 2013, ArXiv.

[31]  William M. Shaw,et al.  On the foundation of evaluation , 1986, J. Am. Soc. Inf. Sci..

[32]  R. Weiss,et al.  A Method for the Analysis of the Structure of Complex Organizations , 1955 .

[33]  James P. Bagrow Evaluating local community methods in networks , 2007, 0706.3880.

[34]  Stefan Boettcher,et al.  Extremal Optimization for Graph Partitioning , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[37]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[38]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[39]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[40]  Chris Jermaine,et al.  Finding the most interesting correlations in a database: how hard can it be? , 2005, Inf. Syst..

[41]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[42]  Dorothea Wagner,et al.  Significance-Driven Graph Clustering , 2007, AAIM.