On the Detectability of Node Grouping in Networks

In typical studies of node grouping detection, the grouping is presumed to have a certain type of correlation with the network structure (e.g., densely connected groups of nodes that are loosely connected in between). People have defined different fitness measures (modularity, conductance, etc.) to quantify such correlation, and group the nodes by optimizing a certain fitness measure. However, a particular grouping with desired semantics, as the target of the detection, is not promised to be detectable by each measure. We study a fundamental problem in the process of node grouping discovery: Given a particular grouping in a network, whether and to what extent it can be discovered with a given fitness measure. We propose two approaches of testing the detectability, namely ranking-based and correlation-based randomization tests. Our methods are evaluated on both synthetic and real datasets, which shows the proposed methods can effectively predict the detectability of groupings of various types, and support explorative process of node grouping discovery.

[1]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[2]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[3]  David A. Kofke,et al.  ARTICLES On the acceptance probability of replica-exchange Monte Carlo trials , 2002 .

[4]  Michele Leone,et al.  (Un)detectable cluster structure in sparse networks. , 2007, Physical review letters.

[5]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[7]  Ruoming Jin,et al.  Axiomatic ranking of network role similarity , 2011, KDD.

[8]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[12]  D. Kofke Erratum: "On the acceptance probability of replica-exchange Monte Carlo trials" [J. Chem. Phys. 117, 6911 (2002)] , 2004 .

[13]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[14]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Osamu Watanabe,et al.  A Simple Message Passing Algorithm for Graph Partitioning Problems , 2006, ISAAC.

[16]  Philip S. Yu,et al.  Community Learning by Graph Approximation , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[17]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[18]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[19]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[20]  John E. Hopcroft,et al.  On the separability of structural classes of communities , 2012, KDD.

[21]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[22]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[23]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Russell Impagliazzo,et al.  Hill-climbing finds random planted bisections , 2001, SODA '01.

[25]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[26]  Ravi Kumar,et al.  Influence and correlation in social networks , 2008, KDD.

[27]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.