On the separability of structural classes of communities

Three major factors govern the intricacies of community extraction in networks: (1) the application domain includes a wide variety of networks of fundamentally different natures, (2) the literature offers a multitude of disparate community detection algorithms, and (3) there is no consensus characterizing how to discriminate communities from non-communities. In this paper, we present a comprehensive analysis of community properties through a class separability framework. Our approach enables the assessement of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. To demostrate this concept, we furnish our method with a large set of structural properties and multiple community detection algorithms. Applied to a diverse collection of large scale network datasets, the analysis reveals that (1) the different detection algorithms extract fundamentally different structures; (2) the structure of communities that arise in practice is closest to that of communities that random-walk-based algorithms extract, although still siginificantly different from that of the output of all the algorithms; and (3) a small subset of the properties are nearly as discriminative as the full set, while making explicit the ways in which the algorithms produce biases. Our framework enables an informed choice of the most suitable community detection method for a given purpose and network and allows for a comparison of existing community detection algorithms while guiding the design of new ones.

[1]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[2]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[3]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[4]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[5]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Bonnie Berger,et al.  IsoBase: a database of functionally related proteins across PPI networks , 2010, Nucleic Acids Res..

[7]  Maria Petrou,et al.  The Two-Point Correlation Function: A Measure of Interclass Separability , 2004, Journal of Mathematical Imaging and Vision.

[8]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[9]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[10]  Tiejun Li,et al.  Optimal partition and effective dynamics of complex networks , 2008, Proceedings of the National Academy of Sciences.

[11]  N. Linial,et al.  Expander Graphs and their Applications , 2006 .

[12]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[13]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[14]  Robert E. Tarjan,et al.  Finding Strongly Knit Clusters in Social Networks , 2008, Internet Math..

[15]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[16]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[17]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[18]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[19]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[20]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[21]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[22]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[23]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[24]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[25]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[27]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[31]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.