Scalable Betweenness Centrality Maximization via Sampling

Betweenness centrality (BWC) is a fundamental centrality measure in social network analysis. Given a large-scale network, how can we find the most central nodes? This question is of great importance to many key applications that rely on BWC, including community detection and understanding graph vulnerability. Despite the large amount of work on scalable approximation algorithm design for BWC, estimating BWC on large-scale networks remains a computational challenge. In this paper, we study the Centrality Maximization problem (CMP): given a graph G = (V,E) and a positive integer k, find a set S* ⊆ V that maximizes BWC subject to the cardinality constraint |S*| ≤ k. We present an efficient randomized algorithm that provides a (1 -- 1/e -- ε)-approximation with high probability, where ε > 0. Our results improve the current state-of-the-art result [40]. Furthermore, we provide the first theoretical evidence for the validity of a crucial assumption in betweenness centrality estimation, namely that in real-world networks O(|V|2) shortest paths pass through the top-k central nodes, where k is a constant. This also explains why our algorithm runs in near linear time on real-world networks. We also show that our algorithm and analysis can be applied to a wider range of centrality measures, by providing a general analytical framework. On the experimental side, we perform an extensive experimental analysis of our method on real-world networks, demonstrate its accuracy and scalability, and study different properties of central nodes. Then, we compare the sampling method used by the state-of-the-art algorithm with our method. Furthermore, we perform a study of BWC in time evolving networks, and see how the centrality of the central nodes in the graphs changes over time. Finally, we compare the performance of the stochastic Kronecker model [28] to real data, and observe that it generates a similar growth pattern.

[1]  Azer Bestavros,et al.  A Framework for the Evaluation and Management of Network Centrality , 2011, SDM.

[2]  Mark E. J. Newman A measure of betweenness centrality based on random walks , 2005, Soc. Networks.

[3]  Rami Puzis,et al.  Incremental deployment of network monitors based on Group Betweenness Centrality , 2009, Inf. Process. Lett..

[4]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully Dynamic Streams with Fixed Memory Size , 2017, ACM Trans. Knowl. Discov. Data.

[5]  Mihail N. Kolountzakis,et al.  Triangle Sparsifiers , 2011, J. Graph Algorithms Appl..

[6]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[7]  Ulrik Brandes,et al.  Centrality Estimation in Large Networks , 2007, Int. J. Bifurc. Chaos.

[8]  B. Mohar,et al.  Graph Minors , 2009 .

[9]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[10]  Christian Borgs,et al.  Maximizing Social Influence in Nearly Optimal Time , 2012, SODA.

[11]  Martin Fink,et al.  Maximum Betweenness Centrality: Approximability and Tractable Cases , 2011, WALCOM.

[12]  Charalampos E. Tsourakakis The K-clique Densest Subgraph Problem , 2015, WWW.

[13]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[14]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[15]  Pu Gao,et al.  On the Longest Paths and the Diameter in Random Apollonian Networks , 2013, Electron. Notes Discret. Math..

[16]  Yuichi Yoshida,et al.  Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches , 2014, KDD.

[17]  Jakub W. Pachocki,et al.  Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling , 2015, KDD.

[18]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[19]  T. Killingback,et al.  Attack Robustness and Centrality of Complex Networks , 2013, PloS one.

[20]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[21]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[22]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[23]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[24]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[25]  Tamás F. Móri,et al.  The Maximum Degree of the Barabási–Albert Random Tree , 2005, Combinatorics, Probability and Computing.

[26]  Adriana Iamnitchi,et al.  K-path centrality: a new centrality measure in social networks , 2011, SNS '11.

[27]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[28]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[30]  Azer Bestavros,et al.  A Divide-and-Conquer Algorithm for Betweenness Centrality , 2014, SDM.

[31]  Ümit V. Çatalyürek,et al.  Shattering and Compressing Networks for Centrality Analysis , 2012, ArXiv.

[32]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[33]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[34]  Kristina Lerman,et al.  The interplay between dynamics and networks: centrality, communities, and cheeger inequality , 2014, KDD.

[35]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[36]  Evgenios M. Kornaropoulos,et al.  Fast approximation of betweenness centrality through sampling , 2014, Data Mining and Knowledge Discovery.

[37]  Alan M. Frieze,et al.  Some Properties of Random Apollonian Networks , 2014, Internet Math..

[38]  Beom Jun Kim,et al.  Attack vulnerability of complex networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Laurence A. Wolsey,et al.  Best Algorithms for Approximating the Maximum of a Submodular Set Function , 1978, Math. Oper. Res..

[40]  Alex Bavelas A Mathematical Model for Group Structures , 1948 .

[41]  Yong Gao The degree distribution of random k-trees , 2009, Theor. Comput. Sci..

[42]  Sebastiano Vigna,et al.  Axioms for Centrality , 2013, Internet Math..

[43]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[44]  Blair D. Sullivan,et al.  Tree decompositions and social graphs , 2014, Internet Math..

[45]  Devavrat Shah,et al.  Rumor centrality: a universal source detector , 2012, SIGMETRICS '12.

[46]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[47]  Jeffrey Xu Yu,et al.  Triangle minimization in large networks , 2014, Knowledge and Information Systems.

[48]  David A. Bader,et al.  Approximating Betweenness Centrality , 2007, WAW.

[49]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .