p-Norm Flow Diffusion for Local Graph Clustering

Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. Prior work on local graph clustering mostly falls into two categories with numerical and combinatorial roots respectively. In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $p\in (1,\infty)$. In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $p\rightarrow\infty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness. We show that the proposed problem can be solved in strongly local running time for $p\ge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.

[1]  Yuval Peres,et al.  Finding sparse cuts locally using evolving sets , 2008, STOC '09.

[2]  Saïd Amghibech,et al.  Eigenvalues of the Discrete p-Laplacian for Graphs , 2003, Ars Comb..

[3]  Xiang Cheng,et al.  Variational perspective on local graph clustering , 2016, Math. Program..

[4]  Kevin J. Lang,et al.  An algorithm for improving graph partitions , 2008, SODA '08.

[5]  Miklós Simonovits,et al.  Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[6]  Di Wang,et al.  Capacity Releasing Diffusion for Speed and Locality , 2017, ICML.

[7]  Kun He,et al.  Local Lanczos Spectral Approximation for Community Detection , 2017, ECML/PKDD.

[8]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Fan Chung,et al.  The heat kernel as the pagerank of a graph , 2007, Proceedings of the National Academy of Sciences.

[10]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[11]  Di Wang,et al.  Targeted Pandemic Containment Through Identifying Local Contact Network Bottlenecks , 2020, ArXiv.

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  Zeyuan Allen Zhu,et al.  Flow-Based Algorithms for Local Graph Clustering , 2013, SODA.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  E. Fraenkel,et al.  Network Modeling Identifies Patient-specific Pathways in Glioblastoma , 2016, Scientific Reports.

[16]  David F. Gleich,et al.  PageRank beyond the Web , 2014, SIAM Rev..

[17]  Miklós Simonovits,et al.  The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[18]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[19]  Stephen Guattery,et al.  On the Quality of Spectral Separators , 1998, SIAM J. Matrix Anal. Appl..

[20]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[21]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[22]  Silvio Lattanzi,et al.  A Local Algorithm for Finding Well-Connected Clusters , 2013, ICML.

[23]  Matthias Hein,et al.  Spectral clustering based on the graph p-Laplacian , 2009, ICML '09.

[24]  Fan Chung Graham,et al.  A Local Graph Partitioning Algorithm Using Heat Kernel Pagerank , 2009, Internet Math..

[25]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[26]  F. Graham,et al.  Diffusion and clustering on large graphs , 2012 .

[27]  Olgica Milenkovic,et al.  Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering , 2018, ICML.

[28]  Shoshana D. Brown,et al.  A gold standard set of mechanistically diverse enzyme superfamilies , 2006, Genome Biology.

[29]  Rania Ibrahim,et al.  Nonlinear Diffusion for Community Detection and Semi-Supervised Learning , 2019, WWW.

[30]  F. Chung Random walks and local cuts in graphs , 2007 .

[31]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[32]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[33]  Stephen J. Wright,et al.  Random permutations fix a worst case for cyclic coordinate descent , 2016, IMA Journal of Numerical Analysis.

[34]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[35]  Jon M. Kleinberg,et al.  Community membership identification from small seed sets , 2014, KDD.

[36]  David F. Gleich,et al.  An Optimization Approach to Locally-Biased Graph Algorithms , 2016, Proc. IEEE.

[37]  Nisheeth K. Vishnoi,et al.  A local spectral method for graphs: with applications to improving graph partitions and exploring data graphs locally , 2009, J. Mach. Learn. Res..

[38]  K. Fountoulakis,et al.  Flow-based Algorithms for Improving Clusters: A Unifying Framework, Software, and Performance , 2020, ArXiv.

[39]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[40]  Mason A. Porter,et al.  Think Locally, Act Locally: The Detection of Small, Medium-Sized, and Large Communities in Large Networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Yuval Peres,et al.  Almost Optimal Local Graph Clustering Using Evolving Sets , 2016, J. ACM.

[42]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.