Testing Graph Clusterability: Algorithms and Lower Bounds

We consider the problem of testing graph cluster structure: given access to a graph G = (V, E), can we quickly determine whether the graph can be partitioned into a few clusters with good inner conductance, or is far from any such graph? This is a generalization of the well-studied problem of testing graph expansion, where one wants to distinguish between the graph having good expansion (i.e. being a good single cluster) and the graph having a sparse cut (i.e. being a union of at least two clusters). A recent work of Czumaj, Peng, and Sohler (STOC'15) gave an ingenious sublinear time algorithm for testing k-clusterability in time Õ(n^1/2 poly(k)). Their algorithm implicitly embeds a random sample of vertices of the graph into Euclidean space, and then clusters the samples based on estimates of Euclidean distances between the points. This yields a very efficient testing algorithm, but only works if the cluster structure is very strong: it is necessary to assume that the gap between conductances of accepted and rejected graphs is at least logarithmic in the size of the graph G. In this paper we show how one can leverage more refined geometric information, namely angles as opposed to distances, to obtain a sublinear time tester that works even when the gap is a sufficiently large constant. Our tester is based on the singular value decomposition of a natural matrix derived from random walk transition probabilities from a small sample of seed nodes. We complement our algorithm with a matching lower bound on the query complexity of testing clusterability. Our lower bound is based on a novel property testing problem, which we analyze using Fourier analytic tools. As a byproduct of our techniques, we also achieve new lower bounds for the problem of approximating MAX-CUT value in sublinear time.

[1]  C. Seshadhri,et al.  A simpler sublinear algorithm for approximating the triangle count , 2015, ArXiv.

[2]  Asaf Shapira,et al.  Testing the expansion of a graph , 2010, Inf. Comput..

[3]  Yuichi Yoshida Lower Bounds on Query Complexity for Testing Bounded-Degree CSPs , 2011, 2011 IEEE 26th Annual Conference on Computational Complexity.

[4]  Dana Ron,et al.  Algorithmic Aspects of Property Testing in the Dense Graphs Model , 2009, APPROX-RANDOM.

[5]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[6]  Sanjeev Khanna,et al.  (1 + Ω(1))-Αpproximation to MAX-CUT Requires Linear Space , 2017, SODA.

[7]  Dana Ron,et al.  On the Benefits of Adaptivity in Property Testing of Dense Graphs , 2010, Algorithmica.

[8]  Dana Ron,et al.  Property Testing in Bounded Degree Graphs , 1997, STOC.

[9]  Pan Peng,et al.  Testing Cluster Structure of Graphs , 2015, STOC.

[10]  Seshadhri Comandur,et al.  Testing Expansion in Bounded Degree Graphs , 2007, Electron. Colloquium Comput. Complex..

[11]  Pan Peng,et al.  Estimating Graph Parameters from Random Order Streams , 2017, SODA.

[12]  Dana Ron,et al.  Testing Bounded Arboricity , 2017, SODA.

[13]  Andrea Montanari,et al.  Extremal Cuts of Sparse Random Graphs , 2015, ArXiv.

[14]  Will Rosenbaum,et al.  On Sampling Edges Almost Uniformly , 2017, SOSA.

[15]  László Lovász,et al.  Non-Deterministic Graph Property Testing , 2012, Combinatorics, Probability and Computing.

[16]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[17]  Asaf Shapira,et al.  Deterministic vs non-deterministic graph property testing , 2013, Electron. Colloquium Comput. Complex..

[18]  Dana Ron,et al.  Approximately Counting Triangles in Sublinear Time , 2017, SIAM J. Comput..

[19]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[20]  Silvio Lattanzi,et al.  A Local Algorithm for Finding Well-Connected Clusters , 2013, ICML.

[21]  Dana Ron,et al.  Finding cycles and trees in sublinear time , 2010, Random Struct. Algorithms.

[22]  Dana Ron,et al.  On approximating the number of k-cliques in sublinear time , 2017, STOC.

[23]  Béla Bollobás,et al.  A Probabilistic Proof of an Asymptotic Formula for the Number of Labelled Regular Graphs , 1980, Eur. J. Comb..

[24]  Dana Ron,et al.  Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection , 2016, ICALP.

[25]  Luca Trevisan,et al.  Partitioning into Expanders , 2014, SODA.

[26]  Seshadhri Comandur,et al.  An Expansion Tester for Bounded Degree Graphs , 2008, SIAM J. Comput..

[27]  Béla Bollobás,et al.  The Isoperimetric Number of Random Regular Graphs , 1988, Eur. J. Comb..