Numerically approximating centrality for graph ranking guarantees

Abstract Many real-world datasets can be represented as graphs. Using iterative solvers to approximate graph centrality measures allows us to obtain a ranking vector on the nodes of the graph, consisting of a number for each vertex in the graph identifying its relative importance. In this work the centrality measures we use are Katz Centrality and PageRank. Given an approximate solution, we use the residual to accurately estimate how much of the ranking matches the ranking given by the exact solution. Using probabilistic matrix norms, we obtain bounds on the accuracy of the approximation compared to the exact solution with respect to the highly ranked nodes and apply numerical analysis to the computation of centrality with iterative methods. This relates the numerical accuracy of the linear solver to the data analysis accuracy of finding the correct ranking. In particular, we answer the question of which pairwise rankings are reliable given an approximate solution to the linear system. Experiments on many real-world undirected and directed networks up to several million vertices and several hundred million edges validate our theory and show that we are able to accurately estimate large portions of the approximation. We also analyze the difference between global centrality scores and personalized scores (w.r.t. specific seed vertices). By analyzing convergence error, we develop confidence in the ranking schemes of data mining. We show we are able to accurately guarantee ranking of vertices with an approximation to centrality metrics faster than current methods.

[1]  D. Spielman Algorithms, Graph Theory, and Linear Equations in Laplacian Matrices , 2011 .

[2]  Yousef Saad,et al.  Trace optimization and eigenproblems in dimension reduction methods , 2011, Numer. Linear Algebra Appl..

[3]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[4]  Laks V. S. Lakshmanan,et al.  Fast Matrix Computations for Pairwise and Columnwise Commute Times and Katz Scores , 2011, Internet Math..

[5]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[6]  David A. Bader,et al.  Graph Ranking Guarantees for Numerical Approximations to Katz Centrality , 2017, ICCS.

[7]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[8]  Achi Brandt,et al.  Lean Algebraic Multigrid (LAMG): Fast Graph Laplacian Linear Solver , 2011, SIAM J. Sci. Comput..

[9]  Michiel E. Hochstenbach,et al.  Probabilistic Upper Bounds for the Matrix Two-Norm , 2013, Journal of Scientific Computing.

[10]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[11]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[12]  Michele Benzi,et al.  MATRIX FUNCTIONS , 2006 .

[13]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[14]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[16]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[17]  Phillip Bonacich,et al.  Some unique properties of eigenvector centrality , 2007, Soc. Networks.

[18]  Kurt C. Foster,et al.  A Faster Katz Status Score Algorithm , 2001, Comput. Math. Organ. Theory.

[19]  Michele Benzi,et al.  Total communicability as a centrality measure , 2013, J. Complex Networks.

[20]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[21]  Ulrik Brandes,et al.  Centrality Estimation in Large Networks , 2007, Int. J. Bifurc. Chaos.