Local Graph Clustering Beyond Cheeger's Inequality

Motivated by applications of large-scale graph clustering, we study random-walk-based LOCAL algorithms whose running times depend only on the size of the output cluster, rather than the entire graph. All previously known such algorithms guarantee an output conductance of $\tilde{O}(\sqrt{\phi(A)})$ when the target set $A$ has conductance $\phi(A)\in[0,1]$. In this paper, we improve it to $$\tilde{O}\bigg( \min\Big\{\sqrt{\phi(A)}, \frac{\phi(A)}{\sqrt{\mathsf{Conn}(A)}} \Big\} \bigg)\enspace, $$ where the internal connectivity parameter $\mathsf{Conn}(A) \in [0,1]$ is defined as the reciprocal of the mixing time of the random walk over the induced subgraph on $A$. For instance, using $\mathsf{Conn}(A) = \Omega(\lambda(A) / \log n)$ where $\lambda$ is the second eigenvalue of the Laplacian of the induced subgraph on $A$, our conductance guarantee can be as good as $\tilde{O}(\phi(A)/\sqrt{\lambda(A)})$. This builds an interesting connection to the recent advance of the so-called improved Cheeger's Inequality [KKL+13], which says that global spectral algorithms can provide a conductance guarantee of $O(\phi_{\mathsf{opt}}/\sqrt{\lambda_3})$ instead of $O(\sqrt{\phi_{\mathsf{opt}}})$. In addition, we provide theoretical guarantee on the clustering accuracy (in terms of precision and recall) of the output set. We also prove that our analysis is tight, and perform empirical evaluation to support our theory on both synthetic and real data. It is worth noting that, our analysis outperforms prior work when the cluster is well-connected. In fact, the better it is well-connected inside, the more significant improvement (both in terms of conductance and accuracy) we can obtain. Our results shed light on why in practice some random-walk-based algorithms perform better than its previous theory, and help guide future research about local clustering.

[1]  Zeyuan Allen Zhu,et al.  Flow-Based Algorithms for Local Graph Clustering , 2013, SODA.

[2]  Silvio Lattanzi,et al.  SoK: The Evolution of Sybil Defense via Social Networks , 2013, 2013 IEEE Symposium on Security and Privacy.

[3]  Silvio Lattanzi,et al.  A Local Algorithm for Finding Well-Connected Clusters , 2013, ICML.

[4]  Luca Trevisan,et al.  Improved Cheeger's inequality: analysis of spectral partitioning algorithms through higher order spectral gap , 2013, STOC '13.

[5]  Shih-Fu Chang,et al.  Learning with Partially Absorbing Random Walks , 2012, NIPS.

[6]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[7]  Aravindan Vijayaraghavan,et al.  Approximation algorithms for semi-random partitioning problems , 2012, STOC '12.

[8]  Luca Trevisan,et al.  Approximating the Expansion Profile and Almost Optimal Local Graph Clustering , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[9]  Vahab Mirrokni,et al.  Overlapping clusters for distributed computation , 2012, WSDM '12.

[10]  Nisheeth K. Vishnoi,et al.  Approximating the exponential, the lanczos method and an Õ(m)-time spectral algorithm for balanced separator , 2011, STOC '12.

[11]  Vahab S. Mirrokni,et al.  Large-Scale Community Detection on YouTube for Topic Discovery and Exploration , 2011, ICWSM.

[12]  Ulrike von Luxburg,et al.  Multi-agent Random Walks for Local Clustering on Graphs , 2010, 2010 IEEE International Conference on Data Mining.

[13]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[14]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[15]  Haixun Wang,et al.  Inverse Time Dependency in Convex Regularized Learning , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Jonah Sherman,et al.  Breaking the Multicommodity Flow Barrier for O(vlog n)-Approximations to Sparsest Cut , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[17]  Yuval Peres,et al.  Finding sparse cuts locally using evolving sets , 2008, STOC '09.

[18]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[19]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[20]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[21]  Nisheeth K. Vishnoi,et al.  On partitioning graphs via single commodity flows , 2008, STOC.

[22]  Kevin J. Lang,et al.  An algorithm for improving graph partitions , 2008, SODA '08.

[23]  Sanjeev Arora,et al.  A combinatorial, primal-dual approach to semidefinite programs , 2007, STOC '07.

[24]  Fan Chung Graham,et al.  Using PageRank to Locally Partition a Graph , 2007, Internet Math..

[25]  Kevin J. Lang,et al.  Communities from seed sets , 2006, WWW '06.

[26]  Yuval Rabani,et al.  ON THE HARDNESS OF APPROXIMATING MULTICUT AND SPARSEST-CUT , 2005, 20th Annual IEEE Conference on Computational Complexity (CCC'05).

[27]  Elad Hazan,et al.  O(sqrt(log(n)) Approximation to SPARSEST CUT in Õ(n2) Time , 2004, SIAM J. Comput..

[28]  Satish Rao,et al.  Expander flows, geometric embeddings and graph partitioning , 2004, STOC '04.

[29]  Satish Rao,et al.  A Flow-Based Method for Improving the Expansion or Conductance of Graph Cuts , 2004, IPCO.

[30]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[31]  Yuval Peres,et al.  Evolving sets and mixing , 2003, STOC '03.

[32]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[33]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[34]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[35]  Frank Thomson Leighton,et al.  Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms , 1999, JACM.

[36]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Miklós Simonovits,et al.  The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[38]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, WG.

[39]  N. Alon Eigenvalues and expanders , 1986, Comb..

[40]  Mircea Merca,et al.  A Note on Cosine Power Sums , 2012 .

[41]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[42]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[43]  Miklós Simonovits,et al.  Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.