Information Theoretic Criteria for Community Detection

Many algorithms for finding community structure in graphs search for a partition that maximizes modularity. However, recent work has identified two important limitations of modularity as a community quality criterion: a resolution limit; and a bias towards finding equal-sized communities. Information-theoretic approaches that search for partitions that minimize description length are a recent alternative to modularity. This paper shows that two information-theoretic algorithms are themselves subject to a resolution limit, identifies the component of each approach that is responsible for the resolution limit, proposes a variant, SGE (Sparse Graph Encoding), that addresses this limitation, and demonstrates on three artificial data sets that (1) SGE does not exhibit a resolution limit on sparse graphs in which other approaches do, and that (2) modularity and the compression-based algorithms, including SGE, behave similarly on graphs not subject to the resolution limit.

[1]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[2]  John Yen,et al.  Probabilistic Community Discovery Using Hierarchical Latent Gaussian Mixture Model , 2007, AAAI.

[3]  David D. Jensen,et al.  Graph clustering with network structure indices , 2007, ICML '07.

[4]  Weixiong Zhang,et al.  Identifying network communities with a high resolution. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[6]  Haluk Bingol,et al.  Community Detection in Complex Networks Using Genetic Algorithms , 2006, 0711.0491.

[7]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[8]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[9]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[11]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[12]  James C. French,et al.  Clustering large datasets in arbitrary metric spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[13]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[14]  James P. Bagrow Evaluating local community methods in networks , 2007, 0706.3880.

[15]  Zohar Nussinov,et al.  An Improved Potts Model Applied to Community Detection , 2008 .

[16]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[17]  P. Ronhovde,et al.  A highly accurate and resolution-limit-free Potts model for community detection , 2008 .

[18]  M. A. Muñoz,et al.  Journal of Statistical Mechanics: An IOP and SISSA journal Theory and Experiment Detecting network communities: a new systematic and efficient algorithm , 2004 .

[19]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[21]  Tina Eliassi-Rad,et al.  Finding Mixed-Memberships in Social Networks , 2008, AAAI Spring Symposium: Social Information Processing.

[22]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..