Almost Optimal Local Graph Clustering Using Evolving Sets

Spectral partitioning is a simple, nearly linear time algorithm to find sparse cuts, and the Cheeger inequalities provide a worst-case guarantee for the quality of the approximation found by the algorithm. A <i>local graph partitioning algorithm</i> finds a set of vertices with small conductance (i.e., a sparse cut) by adaptively exploring part of a large graph <i>G</i>, starting from a specified vertex. For the algorithm to be local, its complexity must be bounded in terms of the size of the set that it outputs, with at most a weak dependence on the number <i>n</i> of vertices in <i>G</i>. Previous local partitioning algorithms find sparse cuts using random walks and personalized PageRank [Spielman and Teng 2013; Andersen et al. 2006]. In this article, we introduce a simple randomized local partitioning algorithm that finds a sparse cut by simulating the <i>volume-biased evolving set process</i>, which is a Markov chain on sets of vertices. We prove that for any ε > 0, and any set of vertices <i>A</i> that has conductance at most φ, for at least half of the starting vertices in <i>A</i> our algorithm will output (with constant probability) a set of conductance <i>O</i>(&sqrt;φ /ε). We prove that for a given run of the algorithm, the expected ratio between its computational complexity and the volume of the set that it outputs is vol(<i>A</i>)<sup>ε</sup>φ<sup>&mins;1/2</sup>polylog(<i>n</i>), where vol(<i>A</i>) = ∑<sub><i>v</i> ∈ <i>A</i></sub><i>d</i>(<i>v</i>) is the volume of the set <i>A</i>. This gives an algorithm with the same guarantee (up to a constant factor) as the Cheeger’s inequality that runs in time slightly superlinear in the size of the output. This is the first sublinear (in the size of the input) time algorithm with almost the same guarantee as the Cheeger’s inequality. In comparison, the best previous local partitioning algorithm, by Andersen et al. [2006], has a worse approximation guarantee of <i>O</i>(&sqrt;φ log <i>n</i>) and a larger ratio of φ<sup>&mins;1</sup> polylog(<i>n</i>) between the complexity and output volume. As a by-product of our results, we prove a bicriteria approximation algorithm for the expansion profile of any graph. For 0 < <i>k</i> ⩽ vol(<i>V</i>)/2, let φ(<i>k</i>) ≔ min <sub><i>S</i>: vol(<i>S</i>) ⩽ <i>k</i></sub>φ(<i>S</i>). There is a polynomial time algorithm that, for any <i>k</i>, ε > 0, finds a set <i>S</i> of volume vol(<i>S</i>) ⩽ <i>O</i>(<i>k</i><sup>1 + ε</sup>) and expansion φ(<i>S</i>)≤ <i>O</i>(&sqrt;φ (<i>k</i>)/ε). As a new technical tool, we show that for any set <i>S</i> of vertices of a graph, a lazy <i>t</i>-step random walk started from a randomly chosen vertex of <i>S</i> will remain entirely inside <i>S</i> with probability at least (1 − φ(<i>S</i>)/2)<sup><i>t</i></sup>. This itself provides a new lower bound to the uniform mixing time of any finite state reversible Markov chain.

[1]  Ravi Montenegro The simple random walk and max-degree walk on a directed graph , 2009, Random Struct. Algorithms.

[2]  G. R. Blakley,et al.  A Hölder type inequality for symmetric matrices with nonnegative entries , 1965 .

[3]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[4]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[5]  David London Inequalities in quadratic forms , 1966 .

[6]  Yuval Peres,et al.  Evolving sets and mixing , 2003, STOC '03.

[7]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8]  Prasad Raghavendra,et al.  Approximations for the isoperimetric and spectral profile of graphs and related parameters , 2010, STOC '10.

[9]  RAVI MONTENEGRO SHARP EDGE , VERTEX , AND MIXED CHEEGER INEQUALITIES FOR FINITE MARKOV KERNELS , 2007 .

[10]  V. Climenhaga Markov chains and mixing times , 2013 .

[11]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[12]  N. Alon,et al.  il , , lsoperimetric Inequalities for Graphs , and Superconcentrators , 1985 .

[13]  Jonah Sherman,et al.  Breaking the Multicommodity Flow Barrier for O(vlog n)-Approximations to Sparsest Cut , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Mark Jerrum,et al.  Approximating the Permanent , 1989, SIAM J. Comput..

[15]  Gary L. Miller,et al.  Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Elad Hazan,et al.  O(/spl radic/log n) approximation to SPARSEST CUT in O/spl tilde/(n/sup 2/) time , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[17]  László Lovász,et al.  Faster mixing via average conductance , 1999, STOC '99.

[18]  P. Diaconis,et al.  Strong Stationary Times Via a New Form of Duality , 1990 .

[19]  Luca Trevisan,et al.  Approximating the Expansion Profile and Almost Optimal Local Graph Clustering , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[20]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Nisheeth K. Vishnoi,et al.  Approximating the exponential, the lanczos method and an Õ(m)-time spectral algorithm for balanced separator , 2011, STOC '12.

[22]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[23]  Nisheeth K. Vishnoi,et al.  Towards an SDP-based approach to spectral methods: a nearly-linear-time algorithm for graph partitioning and decomposition , 2010, SODA '11.

[24]  Luca Trevisan,et al.  Improved Cheeger's inequality: analysis of spectral partitioning algorithms through higher order spectral gap , 2013, STOC '13.

[25]  Robert Krauthgamer,et al.  Min-max Graph Partitioning and Small Set Expansion , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[26]  Lap Chi Lau,et al.  Finding Small Sparse Cuts by Random Walk , 2012, APPROX-RANDOM.

[27]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[28]  Pan Peng,et al.  Community Structures in Classical Network Models , 2011, Internet Math..

[29]  Sanjeev Arora,et al.  Subexponential Algorithms for Unique Games and Related Problems , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[30]  Ravi Montenegro Sharp edge, vertex, and mixed Cheeger type inequalities for finite Markov kernels , 2006 .

[31]  Kevin J. Lang,et al.  Communities from seed sets , 2006, WWW '06.

[32]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[33]  Ryan O'Donnell,et al.  Improved small-set expansion from higher eigenvalues , 2012, ArXiv.

[34]  Noga Alon,et al.  lambda1, Isoperimetric inequalities for graphs, and superconcentrators , 1985, J. Comb. Theory, Ser. B.

[35]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[36]  Prasad Raghavendra,et al.  Graph expansion and the unique games conjecture , 2010, STOC '10.

[37]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[38]  Lap Chi Lau,et al.  Random Walks and Evolving Sets: Faster Convergences and Limitations , 2015, SODA.

[39]  Pan Peng,et al.  Testing Conductance in General Graphs , 2011, Electron. Colloquium Comput. Complex..

[40]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[41]  Yuval Peres,et al.  Finding sparse cuts locally using evolving sets , 2008, STOC '09.

[42]  C. Smith,et al.  An Inequality Arising in Genetical Theory , 1959 .

[43]  Robert Krauthgamer,et al.  Min-Max Graph Partitioning and Small Set Expansion , 2014, SIAM J. Comput..

[44]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..