Resampling Effects on Significance Analysis of Network Clustering and Ranking

Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984–2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change.

[1]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[2]  Alessandro Vespignani Modelling dynamical processes in complex socio-technical systems , 2011, Nature Physics.

[3]  R. Carter 11 – IT and society , 1991 .

[4]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[5]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[6]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[7]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[8]  Jie Cheng,et al.  Measuring the significance of community structure in complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[10]  Jean-Cédric Chappelier,et al.  Finding instabilities in the community structure of complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Martin Rosvall,et al.  Significant Communities in Large Sparse Networks , 2011, PloS one.

[12]  Carl T. Bergstrom,et al.  Mapping Change in Large Networks , 2008, PloS one.

[13]  M. Hastings Community detection as an inference problem. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[15]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[16]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[18]  M. Newman,et al.  Robustness of community structure in networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[21]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[22]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[23]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[24]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Roger Guimerà,et al.  Extracting the hierarchical organization of complex systems , 2007, Proceedings of the National Academy of Sciences.

[26]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[27]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[28]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[29]  Shilpa Chakravartula,et al.  Complex Networks: Structure and Dynamics , 2014 .

[30]  Leon Danon,et al.  The effect of size heterogeneity on community identification in complex networks , 2006, physics/0601144.

[31]  F. Radicchi,et al.  Statistical significance of communities in networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.