A Quality Measure for Multi-Level Community Structure

Mining relational data often boils down to computing clusters, that is finding sub-communities of data elements forming cohesive sub-units, while being well separated from one another. The clusters themselves are sometimes terms "communities" and the way clusters relate to one another is often referred to as a "community structure". We study a modularity criterion MQ introduced by Mancoridis et al. in order to infer community structure on relational data. We prove a fundamental and useful property of the modularity measure MQ, showing that it can be approximated by a Gaussian distribution, making it a prevalent choice over less focused optimization criterion for graph clustering. This makes it possible to compare two different clusterings of a same graph as well as asserting the overall quality of a given clustering relying on the fact that MQ is Gaussian. Moreover, we introduce a generalization extending MQ to hierarchical clusterings of graphs which reduces to the original MQ when the hierarchy becomes flat

[1]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Jie Wu,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2003 .

[3]  P. A. P. Moran,et al.  An introduction to probability theory , 1968 .

[4]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[5]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[6]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[7]  Andrew B. Kahng,et al.  Recent directions in netlist partitioning: a survey , 1995, Integr..

[8]  B. Harshbarger An Introduction to Probability Theory and its Applications, Volume I , 1958 .

[9]  Andrew B. Kahng,et al.  Recent developments in netlist partitioning: a survey , 1995 .

[10]  Martin Suter,et al.  Small World , 2002 .

[11]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Guy Melançon,et al.  Software components capture using graph clustering , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[13]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[14]  Stefan Bornholdt,et al.  Handbook of Graphs and Networks: From the Genome to the Internet , 2003 .

[15]  Guy Melançon,et al.  Multiscale visualization of small world networks , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[16]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[17]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.