Eccentricity Heuristics through Sublinear Analysis Lenses

The eccentricity of a node in a graph is its maximal shortestpath distance to any other node. Computing all eccentricities is a basic task in large-scale graph mining. Shun (KDD 2015) empirically studied two simple heuristics for this task: k-BFS1, based on parallel BFS from a small sample of nodes, was shown to work well on a variety of graphs; kBFS2, a two-phase version, was shown to outperform stateof-the-art algorithms by up to orders of magnitude. This empirical success stands in apparent contrast to recent theoretical hardness results on approximating all eccentricities (Backurs et al., STOC 2018). This paper aims to formally explain the performance of these heuristics, by studying them through computational models designed for sublinear time or sublinear space algorithms. We use the proposed framework to derive improved variants, which retain their practicality while having better performance and formal guarantees. 1. For k-BFS1, we draw a connection to diameter property testing (Parnas and Ron, Random Struct Alg. 2002). It is not hard to observe that k-BFS1 essentially tests the values of all eccentricities simultaneously, in the classical property testing sense. We show that the same guarantee is achieved by a more efficient algorithm, whose work is nearly linear in the number of nodes and independent of the number of edges. By utilizing the connection in the opposite direction, we also obtain some results on classical testing of the graph radius and diameter. 2. For k-BFS2, we draw a connection to the streaming Set Cover algorithm of Demaine et al. (DISC 2014). We use it to suggest a variant of k-BFS2 with similar work and depth bounds, which is guaranteed to compute almost all eccentricities exactly, if the graph satisfies a condition we call small eccentric cover. The condition can be ascertained for all real-world graph used in Shun (KDD 2015) and in our experiments. Our experimental results on real-world graphs demonstrate the validity of our analysis and the empirical advantage of the proposed algorithms.

[1]  Reinhard Schneider,et al.  Using graph theory to analyze biological networks , 2011, BioData Mining.

[2]  Liam Roditty,et al.  Fast approximation algorithms for the diameter and radius of sparse graphs , 2013, STOC '13.

[3]  Ümit V. Çatalyürek,et al.  Regularizing graph centrality computations , 2015, J. Parallel Distributed Comput..

[4]  Dana Ron,et al.  Testing the diameter of graphs , 1999, RANDOM-APPROX.

[5]  David Steurer,et al.  Analytical approach to parallel repetition , 2013, STOC.

[6]  H. Howie Huang,et al.  iBFS: Concurrent Breadth-First Search on GPUs , 2016, SIGMOD Conference.

[7]  Russell Impagliazzo,et al.  On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..

[8]  R. Ryan Williams,et al.  Some Estimated Likelihoods for Computational Complexity , 2019, Computing and Software Science.

[9]  Jorge C. S. Cardoso,et al.  Probabilistic Estimation of Network Size and Diameter , 2009, 2009 Fourth Latin-American Symposium on Dependable Computing.

[10]  Ana Paula Appel,et al.  HADI: Mining Radii of Large Graphs , 2011, TKDD.

[11]  Liam Roditty,et al.  Towards tight approximation bounds for graph diameter and eccentricities , 2018, STOC.

[12]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[13]  Raquel Menezes,et al.  Extrema Propagation: Fast Distributed Estimation of Sums and Network Sizes , 2012, IEEE Transactions on Parallel and Distributed Systems.

[14]  Robert E. Tarjan,et al.  Better Approximation Algorithms for the Graph Diameter , 2014, SODA.

[15]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[16]  Karl Henrik Johansson,et al.  Distributed Estimation of Diameter, Radius and Eccentricities in Anonymous Networks , 2012 .

[17]  Joshua R. Wang,et al.  Approximation and Fixed Parameter Subquadratic Algorithms for Radius and Diameter in Sparse Graphs , 2016, SODA.

[18]  Oded Goldreich,et al.  Introduction to Property Testing , 2017 .

[19]  Walter A. Kosters,et al.  Determining the diameter of small world networks , 2011, CIKM '11.

[20]  Dana Ron,et al.  Tight Bounds for Testing Bipartiteness in General Graphs , 2004, SIAM J. Comput..

[21]  Karem A. Sakallah,et al.  Computing Vertex Eccentricity in Exponentially Large Graphs: QBF Formulation and Solution , 2003, SAT.

[22]  Raquel Menezes,et al.  Fast Estimation of Aggregates in Unstructured Networks , 2009, 2009 Fifth International Conference on Autonomic and Autonomous Systems.

[23]  Walter A. Kosters,et al.  Computing the Eccentricity Distribution of Large Graphs , 2013, Algorithms.

[24]  Huy T. Vo,et al.  The More the Merrier: Efficient Multi-Source Graph Traversal , 2014, Proc. VLDB Endow..

[25]  Piotr Indyk,et al.  On Streaming and Communication Complexity of the Set Cover Problem , 2014, DISC.

[26]  Damien Magoni,et al.  Analysis and Comparison of Internet Topology Generators , 2002, NETWORKING.

[27]  Roberto Grossi,et al.  New Bounds for Approximating Extremal Distances in Undirected Graphs , 2016, SODA.

[28]  Roger Pearce,et al.  Computing Exact Vertex Eccentricity on Massive-Scale Distributed Graphs , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[29]  Guy E. Blelloch,et al.  Linear-work greedy parallel approximate set cover and variants , 2011, SPAA '11.

[30]  Julian Shun,et al.  An Evaluation of Parallel Eccentricity Estimation Algorithms on Undirected Real-World Graphs , 2015, KDD.

[31]  Guy E. Blelloch,et al.  Parallel and I/O efficient set covering algorithms , 2012, SPAA '12.