Approximating centrality in evolving graphs: toward sublinearity

The identification of important nodes is a ubiquitous problem in the analysis of social networks. Centrality indices (such as degree centrality, closeness centrality, betweenness centrality, PageRank, and others) are used across many domains to accomplish this task. However, the computation of such indices is expensive on large graphs. Moreover, evolving graphs are becoming increasingly important in many applications. It is therefore desirable to develop on-line algorithms that can approximate centrality measures using memory sublinear in the size of the graph. We discuss the challenges facing the semi-streaming computation of many centrality indices. In particular, we apply recent advances in the streaming and sketching literature to provide a preliminary streaming approximation algorithm for degree centrality utilizing CountSketch and a multi-pass semi-streaming approximation algorithm for closeness centrality leveraging a spanner obtained through iteratively sketching the vertex-edge adjacency matrix. We also discuss possible ways forward for approximating betweenness centrality, as well as spectral measures of centrality. We provide a preliminary result using sketched low-rank approximations to approximate the output of the HITS algorithm.

[1]  David Hawking,et al.  Predicting Fame and Fortune: PageRank or Indegree? , 2003 .

[2]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[3]  Evgenios M. Kornaropoulos,et al.  Fast approximation of betweenness centrality through sampling , 2014, WSDM.

[4]  Adriana Iamnitchi,et al.  Identifying high betweenness centrality nodes in large social networks , 2012, Social Network Analysis and Mining.

[5]  Alexandr Andoni,et al.  Eigenvalues of a matrix in the streaming model , 2013, SODA.

[6]  Christian Staudt,et al.  Approximating Betweenness Centrality in Large Evolving Networks , 2014, ALENEX.

[7]  David A. Bader,et al.  Approximating Betweenness Centrality , 2007, WAW.

[8]  David A. Bader,et al.  A Fast Algorithm for Streaming Betweenness Centrality , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[9]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  David P. Woodruff,et al.  On Sketching Matrix Norms and the Top Singular Vector , 2014, SODA.

[12]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[13]  Francesco Bonchi,et al.  Scalable online betweenness centrality in evolving graphs , 2016, ICDE.

[14]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[15]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[16]  Yin Tat Lee,et al.  Single Pass Spectral Sparsification in Dynamic Streams , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[17]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[18]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[19]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[20]  Michael B. Wakin,et al.  Sketched SVD: Recovering Spectral Features from Compressive Measurements , 2012, ArXiv.

[21]  Sudipto Guha,et al.  Analyzing graph structure via linear measurements , 2012, SODA.

[22]  Sebastiano Vigna,et al.  Axioms for Centrality , 2013, Internet Math..

[23]  Sudipto Guha,et al.  Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.

[24]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[25]  Hossein Jowhari,et al.  Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[26]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[27]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[28]  David P. Woodruff,et al.  Turnstile streaming algorithms might as well be linear sketches , 2014, STOC.

[29]  Ulrik Brandes,et al.  Centrality Estimation in Large Networks , 2007, Int. J. Bifurc. Chaos.

[30]  Wei Wei,et al.  Real Time Closeness and Betweenness Centrality Calculations on Streaming Network Data , 2014 .