Sampling Based Katz Centrality Estimation for Large-Scale Social Networks

Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network is unpractical and computationally expensive. In this paper, we propose a novel method to estimate Katz centrality based on graph sampling techniques. Specifically, we develop an unbiased estimator for Katz centrality using a multi-round sampling approach. We further propose SAKE, a Sampling based Algorithm for fast Katz centrality Estimation. We prove that the estimator calculated by SAKE is probabilistically guaranteed to be within an additive error from the exact value. The computational complexity of SAKE is much lower than the state-of-the-arts. Extensive evaluation experiments based on four real world networks show that the proposed algorithm achieves low mean relative error with low sampling rate, and it works well in identifying high influence vertices in social networks.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[3]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[4]  Eisha Nathan,et al.  Approximating Personalized Katz Centrality in Dynamic Graphs , 2017, PPAM.

[5]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[6]  Evgenios M. Kornaropoulos,et al.  Fast approximation of betweenness centrality through sampling , 2014, Data Mining and Knowledge Discovery.

[7]  Oskar Skibski,et al.  An Axiomatization of the Eigenvector and Katz Centralities , 2018, AAAI.

[8]  Eli Upfal,et al.  ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages , 2018, ACM Trans. Knowl. Discov. Data.

[9]  Markus Strohmaier,et al.  Sampling from Social Networks with Attributes , 2017, WWW.

[10]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[11]  David A. Bader,et al.  Graph Ranking Guarantees for Numerical Approximations to Katz Centrality , 2017, ICCS.

[12]  Sebastiano Vigna,et al.  Axioms for Centrality , 2013, Internet Math..

[13]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[14]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[15]  Petter Holme,et al.  Ranking Candidate Disease Genes from Gene Expression and Protein Interaction: A Katz-Centrality Based Approach , 2011, PloS one.

[16]  Eric Balkanski,et al.  Approximation Guarantees for Adaptive Sampling , 2018, ICML.

[17]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[18]  Kurt C. Foster,et al.  A Faster Katz Status Score Algorithm , 2001, Comput. Math. Organ. Theory.

[19]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[20]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21]  Tanya Y. Berger-Wolf,et al.  Benefits of bias: towards better characterization of network sampling , 2011, KDD.

[22]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[23]  Laks V. S. Lakshmanan,et al.  Fast Matrix Computations for Pairwise and Columnwise Commute Times and Katz Scores , 2011, Internet Math..

[24]  Dana Ron,et al.  Provable and Practical Approximations for the Degree Distribution using Sublinear Graph Samples , 2018, WWW.