The Impact of Partially Missing Communities on the Reliability of Centrality Measures

Network data is usually not error-free, and the absence of some nodes is a very common type of measurement error. Studies have shown that the reliability of centrality measures is severely affected by missing nodes. This paper investigates the reliability of centrality measures when missing nodes are likely to belong to the same community. We study the behavior of five commonly used centrality measures in uniform and scale-free networks in various error scenarios. We find that centrality measures are generally more reliable when missing nodes are likely to belong to the same community than in cases in which nodes are missing uniformly at random. In scale-free networks, the betweenness centrality becomes, however, less reliable when missing nodes are more likely to belong to the same community. Moreover, centrality measures in scale-free networks are more reliable in networks with stronger community structure. In contrast, we do not observe this effect for uniform networks. Our observations suggest that the impact of missing nodes on the reliability of centrality measures might not be as severe as the literature suggests.

[1]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[2]  Ilya Safro,et al.  Generating Scaled Replicas of Real-World Complex Networks , 2016, COMPLEX NETWORKS.

[3]  Matthew J. Silk,et al.  The next steps in the study of missing individuals in networks: a comment on Smith et al. (2017) , 2018, Soc. Networks.

[4]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[5]  James Moody,et al.  Structural effects of network sampling coverage I: Nodes missing at random , 2013, Soc. Networks.

[6]  Reuven Cohen,et al.  Complex Networks: Structure, Robustness and Function , 2010 .

[7]  Paul Erdös,et al.  On random graphs, I , 1959 .

[8]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[9]  Marco Rosa,et al.  Robustness of social and web graphs to node removal , 2013, Social Network Analysis and Mining.

[10]  Christian Staudt,et al.  NetworKit: A tool suite for large-scale complex network analysis , 2014, Network Science.

[11]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[13]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Mohammad Reza Meybodi,et al.  Sampling social networks using shortest paths , 2015 .

[15]  Hans Jürgen Prömel,et al.  Finding clusters in VLSI circuits , 1990, 1990 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[16]  Thomas W. Valente,et al.  The stability of centrality measures when networks are sampled , 2003, Soc. Networks.

[17]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[18]  Albert-Lszl Barabsi,et al.  Network Science , 2016, Encyclopedia of Big Data.

[19]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[20]  Gueorgi Kossinets Effects of missing data in social networks , 2006, Soc. Networks.

[21]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[22]  Stefan Richter,et al.  Centrality Indices , 2004, Network Analysis.

[23]  Pan-Jun Kim,et al.  Reliability of rank order in sampled networks , 2005, physics/0702148.

[24]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[25]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[26]  Jure Leskovec,et al.  Measurement error in network data: A re-classification , 2012, Soc. Networks.

[27]  J. Bolland,et al.  Sorting out centrality: An analysis of the performance of four centrality models in real and simulated networks , 1988 .

[28]  Peng Zhang,et al.  Comparative definition of community and corresponding identifying algorithm. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Jürgen Pfeffer,et al.  Robustness of Network Centrality Metrics in the Context of Digital Communication Data , 2015, 2015 48th Hawaii International Conference on System Sciences.

[30]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[31]  James Moody,et al.  Network sampling coverage II: The effect of non-random missing data on network measurement , 2017, Soc. Networks.

[32]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[33]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[34]  Christoph Martin,et al.  Influence of measurement errors on networks: Estimating the robustness of centrality measures , 2017, Network Science.

[35]  Terrill L. Frantz,et al.  Robustness of centrality measures under uncertainty: Examining the role of network topology , 2009, Comput. Math. Organ. Theory.

[36]  An Zeng,et al.  Robustness of centrality measures against network manipulation , 2015 .

[37]  Kathleen M. Carley,et al.  On the robustness of centrality measures under conditions of imperfect data , 2006, Soc. Networks.

[38]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[39]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Cynthia M. Lakon,et al.  How Correlated Are Network Centrality Measures? , 2008, Connections.