A Short Introduction to Local Graph Clustering Methods and Software

Graph clustering has many important applications in computing, but due to the increasing sizes of graphs, even traditionally fast clustering methods can be computationally expensive for real-world graphs of interest. Scalability problems led to the development of local graph clustering algorithms that come with a variety of theoretical guarantees. Rather than return a global clustering of the entire graph, local clustering algorithms return a single cluster around a given seed node or set of seed nodes. These algorithms improve scalability because they use time and memory resources that depend only on the size of the cluster returned, instead of the size of the input graph. Indeed, for many of them, their running time grows linearly with the size of the output. In addition to scalability arguments, local graph clustering algorithms have proven to be very useful for identifying and interpreting small-scale and meso-scale structure in large-scale graphs. As opposed to heuristic operational procedures, this class of algorithms comes with strong algorithmic and statistical theory. These include statistical guarantees that prove they have implicit regularization properties. One of the challenges with the existing literature on these approaches is that they are published in a wide variety of areas, including theoretical computer science, statistics, data science, and mathematics. This has made it difficult to relate the various algorithms and ideas together into a cohesive whole. We have recently been working on unifying these diverse perspectives through the lens of optimization as well as providing software to perform these computations in a cohesive fashion. In this note, we provide a brief introduction to local graph clustering, we provide some representative examples of our perspective, and we introduce our software named Local Graph Clustering (LGC).

[1]  Mason A. Porter,et al.  Think Locally, Act Locally: The Detection of Small, Medium-Sized, and Large Communities in Large Networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[3]  Satish Rao,et al.  A Flow-Based Method for Improving the Expansion or Conductance of Graph Cuts , 2004, IPCO.

[4]  David F. Gleich,et al.  Using Local Spectral Methods to Robustify Graph-Based Learning Algorithms , 2015, KDD.

[5]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[6]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[7]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[8]  David F. Gleich,et al.  An Optimization Approach to Locally-Biased Graph Algorithms , 2016, Proc. IEEE.

[9]  Di Wang,et al.  Capacity Releasing Diffusion for Speed and Locality , 2017, ICML.

[10]  A. Arenas,et al.  Models of social networks based on social distance attachment. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[12]  David F. Gleich,et al.  A Simple and Strongly-Local Flow-Based Method for Cut Improvement , 2016, ICML.

[13]  Kevin J. Lang,et al.  An algorithm for improving graph partitions , 2008, SODA '08.

[14]  Xiang Cheng,et al.  Variational perspective on local graph clustering , 2016, Mathematical Programming.

[15]  David F. Gleich,et al.  Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow , 2014, ICML.