Measuring and maximizing group closeness centrality over disk-resident graphs

As an important metric in graphs, group closeness centrality measures how close a group of vertices is to all other vertices in a graph, and it is used in numerous graph applications such as measuring the dominance and influence of a node group over the graph. However, when a large-scale graph contains hundreds of millions of nodes/edges which cannot reside entirely in computer's main memory, measuring and maximizing group closeness become challenging tasks. In this paper, we present a systematic solution for efficiently calculating and maximizing the group closeness for disk-resident graphs. Our solution first leverages a probabilistic counting method to efficiently estimate the group closeness with high accuracy, rather than exhaustively computing it in an exact fashion. In addition, we design an I/O-efficient greedy algorithm to find a node group that maximizes group closeness. Our proposed algorithm significantly reduces the number of random accesses to disk, thereby dramatically improving computational efficiency. Experiments on real-world big graphs demonstrate the efficacy of our approach.

[1]  S. Borgatti,et al.  The centrality of groups and classes , 1999 .

[2]  Jimeng Sun,et al.  Centralities in Large Networks: Algorithms and Observations , 2011, SDM.

[3]  David Eppstein,et al.  Fast approximation of centrality , 2000, SODA '01.

[4]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[5]  Bonnie Berger,et al.  Efficient NC Algorithms for Set Cover with Applications to Learning and Geometry , 1994, J. Comput. Syst. Sci..

[6]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[7]  Graham Cormode,et al.  Set cover algorithms for very large datasets , 2010, CIKM.

[8]  Bruno Ribeiro,et al.  Estimating The Top k Nodes Of A Network , 2011 .

[9]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[10]  Ümit V. Çatalyürek,et al.  STREAMER: A distributed framework for incremental closeness centrality computation , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[11]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[12]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[13]  Bruno Ribeiro,et al.  Online estimating the k central nodes of a network , 2011, 2011 IEEE Network Science Workshop.

[14]  Xiang-Yang Li,et al.  Ranking of Closeness Centrality for Large-Scale Social Networks , 2008, FAW.

[15]  Tanya Y. Berger-Wolf,et al.  Online Sampling of High Centrality Individuals in Social Networks , 2010, PAKDD.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[18]  Ian Foster,et al.  Distance Estimation for Very Large Networks using MapReduce and Network Structure Indices , 2011 .

[19]  Ravi Kumar,et al.  Max-cover in map-reduce , 2010, WWW '10.

[20]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[21]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.