Flow-Based Local Graph Clustering with Better Seed Set Inclusion

Flow-based methods for local graph clustering have received significant recent attention for their theoretical cut improvement and runtime guarantees. In this work we present two improvements for using flow-based methods in real-world semi-supervised clustering problems. Our first contribution is a generalized objective function that allows practitioners to place strict and soft penalties on excluding specific seed nodes from the output set. This feature allows us to avoid the tendency, often exhibited by previous flow-based methods, to contract a large seed set into a small set of nodes that does not contain all or even most of the seed nodes. Our second contribution is a fast algorithm for minimizing our generalized objective function, based on a variant of the push-relabel algorithm for computing preflows. We make our approach very fast in practice by implementing a global relabeling heuristic and employing a warm-start procedure to quickly solve related cut problems. In practice our algorithm is faster than previous related flow-based methods, and is also more robust in detecting ground truth target regions in a graph, thanks to its ability to better incorporate semi-supervised information about target clusters.

[1]  Jichao Zhao,et al.  Fully Automatic Left Atrium Segmentation From Late Gadolinium Enhanced Magnetic Resonance Imaging Using a Dual Fully Convolutional Neural Network , 2019, IEEE Transactions on Medical Imaging.

[2]  Di Wang,et al.  Capacity Releasing Diffusion for Speed and Locality , 2017, ICML.

[3]  David F. Gleich,et al.  A Simple and Strongly-Local Flow-Based Method for Cut Improvement , 2016, ICML.

[4]  Xiang Cheng,et al.  Variational perspective on local graph clustering , 2016, Mathematical Programming.

[5]  Jon M. Kleinberg,et al.  Community membership identification from small seed sets , 2014, KDD.

[6]  David F. Gleich,et al.  Anti-differentiating approximation algorithms: A case study with min-cuts, spectral, and flow , 2014, ICML.

[7]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[8]  Tamara G. Kolda,et al.  Using Triangles to Improve Community Detection in Directed Networks , 2014, ArXiv.

[9]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[10]  Zeyuan Allen Zhu,et al.  Flow-Based Algorithms for Local Graph Clustering , 2013, SODA.

[11]  James B. Orlin,et al.  Max flows in O(nm) time, or better , 2013, STOC '13.

[12]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[13]  Nisheeth K. Vishnoi,et al.  A local spectral method for graphs: with applications to improving graph partitions and exploring data graphs locally , 2009, J. Mach. Learn. Res..

[14]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[15]  Kevin J. Lang,et al.  An algorithm for improving graph partitions , 2008, SODA '08.

[16]  John G. Csernansky,et al.  Open Access Series of Imaging Studies (OASIS): Cross-sectional MRI Data in Young, Middle Aged, Nondemented, and Demented Older Adults , 2007, Journal of Cognitive Neuroscience.

[17]  Kevin J. Lang,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[18]  Kevin J. Lang,et al.  Communities from seed sets , 2006, WWW '06.

[19]  Satish Rao,et al.  A Flow-Based Method for Improving the Expansion or Conductance of Graph Cuts , 2004, IPCO.

[20]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.

[21]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[23]  K. J. Lee,et al.  Algorithm for Solution of a Problem of Maximumow in a Network with Power Estimation , 2007 .

[24]  E. A. Dinic Algorithm for solution of a problem of maximal flow in a network with power estimation , 1970 .