SAKE

Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network are unpractical and computationally expensive. In this article, we propose a novel method to estimate Katz centrality based on graph sampling techniques, which object to achieve comparable estimation accuracy of the state-of-the-arts with much lower computational complexity. Specifically, we develop a Horvitz–Thompson estimate for Katz centrality by using a multi-round sampling approach and deriving an unbiased mean value estimator. We further propose SAKE, a Sampling-based Algorithm for fast Katz centrality Estimation. We prove that the estimator calculated by SAKE is probabilistically guaranteed to be within an additive error from the exact value. Extensive evaluation experiments based on four real-world networks show that the proposed algorithm can estimate Katz centralities for partial vertices with low sampling rate, low computation time, and it works well in identifying high influence vertices in social networks.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  Shiyu Ji,et al.  Refining Approximating Betweenness Centrality Based on Samplings , 2016, ArXiv.

[3]  Douglas D. Heckathorn,et al.  Network Sampling: From Snowball and Multiplicity to Respondent-Driven Sampling , 2017 .

[4]  P. Bonacich Factoring and weighting approaches to status scores and clique identification , 1972 .

[5]  Markus Strohmaier,et al.  Sampling from Social Networks with Attributes , 2017, WWW.

[6]  Liran Katzir,et al.  Estimating clustering coefficients and size of social networks via random walk , 2013, TWEB.

[7]  Michele Benzi,et al.  Total communicability as a centrality measure , 2013, J. Complex Networks.

[8]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[9]  Nicolas A. Menzies,et al.  Estimation and correction of bias in network simulations based on respondent-driven sampling data , 2020, Scientific Reports.

[10]  Wei Chen,et al.  Interplay between Social Influence and Network Centrality: A Comparative Study on Shapley Centrality and Single-Node-Influence Centrality , 2016, WWW.

[11]  Dana Ron,et al.  Provable and Practical Approximations for the Degree Distribution using Sublinear Graph Samples , 2018, WWW.

[12]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[13]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[14]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Donald F. Towsley,et al.  On the estimation accuracy of degree distributions from graph sampling , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[17]  Oskar Skibski,et al.  An Axiomatization of the Eigenvector and Katz Centralities , 2018, AAAI.

[18]  Kazuyuki Shudo,et al.  Average Path Length Estimation of Social Networks by Random Walk , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[19]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[20]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[21]  Eisha Nathan,et al.  Approximating Personalized Katz Centrality in Dynamic Graphs , 2017, PPAM.

[22]  Ramana Rao Kompella,et al.  Reconsidering the Foundations of Network Sampling , 2010 .

[23]  David A. Bader,et al.  Graph Ranking Guarantees for Numerical Approximations to Katz Centrality , 2017, ICCS.

[24]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[25]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[26]  Thomas W. Valente,et al.  The stability of centrality measures when networks are sampled , 2003, Soc. Networks.

[27]  Kathleen M. Carley,et al.  On the robustness of centrality measures under conditions of imperfect data , 2006, Soc. Networks.

[28]  David Eppstein,et al.  Fast approximation of centrality , 2000, SODA '01.

[29]  Evgenios M. Kornaropoulos,et al.  Fast approximation of betweenness centrality through sampling , 2014, Data Mining and Knowledge Discovery.

[30]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[31]  T. Stevens,et al.  Pneumonia-induced endothelial amyloids reduce dendritic spine density in brain neurons , 2020, Scientific Reports.

[32]  Jianguo Lu,et al.  Bias correction in clustering coefficient estimation , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[33]  Sebastiano Vigna,et al.  Axioms for Centrality , 2013, Internet Math..

[34]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[35]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[36]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[37]  David A. Bader,et al.  Scalable Katz Ranking Computation in Large Static and Dynamic Graphs , 2018, ESA.

[38]  Petter Holme,et al.  Ranking Candidate Disease Genes from Gene Expression and Protein Interaction: A Katz-Centrality Based Approach , 2011, PloS one.

[39]  Wenzhong Li,et al.  Sampling Based Katz Centrality Estimation for Large-Scale Social Networks , 2019, ICA3PP.

[40]  Kurt C. Foster,et al.  A Faster Katz Status Score Algorithm , 2001, Comput. Math. Organ. Theory.

[41]  Tanya Y. Berger-Wolf,et al.  Benefits of bias: towards better characterization of network sampling , 2011, KDD.

[42]  Mohammad Reza Meybodi,et al.  Social Network Sampling , 2019, Studies in Computational Intelligence.

[43]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[44]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .