Distributed Graph Clustering and Sparsification

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation and cannot be directly applied for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest and has comprehensive applications for processing big datasets. In this article, we present a simple and distributed algorithm for graph clustering: For a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds and recovers a partition of the graph close to optimal. One of the main procedures behind our algorithm is a sampling scheme that, given a dense graph as input, produces a sparse subgraph that provably preserves the cluster-structure of the input. Compared with previous sparsification algorithms that require Laplacian solvers or involve combinatorial constructions, this procedure is easy to implement in a distributed setting and runs fast in practice.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  David R. Karger,et al.  Approximating s – t Minimum Cuts in ~ O(n 2 ) Time , 2007 .

[3]  Richard Peng,et al.  Partitioning Well-Clustered Graphs: Spectral Clustering Works! , 2014, SIAM J. Comput..

[4]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[5]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[6]  He Sun,et al.  Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time , 2017 .

[7]  Nikhil Srivastava,et al.  Twice-ramanujan sparsifiers , 2008, STOC '09.

[8]  Silvio Lattanzi,et al.  A Local Algorithm for Finding Well-Connected Clusters , 2013, ICML.

[9]  Luca Trevisan,et al.  Approximating the Expansion Profile and Almost Optimal Local Graph Clustering , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[10]  Yin Tat Lee,et al.  An SDP-based algorithm for linear-sized spectral sparsification , 2017, STOC.

[11]  Shang-Hua Teng,et al.  Spectral Sparsification of Graphs , 2008, SIAM J. Comput..

[12]  Luca Trevisan,et al.  Multi-way spectral partitioning and higher-order cheeger inequalities , 2011, STOC '12.

[13]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[14]  Fan Chung Graham,et al.  Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..

[15]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[16]  Prasad Raghavendra,et al.  Average Whenever You Meet: Opportunistic Protocols for Community Detection , 2017, ESA.

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Huan Xu,et al.  A Divide and Conquer Framework for Distributed Graph Clustering , 2015, ICML.

[19]  David Kempe,et al.  A decentralized algorithm for spectral analysis , 2008, J. Comput. Syst. Sci..

[20]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[21]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[22]  Oliver Vornberger,et al.  The Complexity of Testing Whether a Graph is a Superconcentrator , 1981, Inf. Process. Lett..

[23]  Luca Trevisan,et al.  Find Your Place: Simple Distributed Algorithms for Community Detection , 2015, SODA.

[24]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[25]  Yin Tat Lee,et al.  Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[26]  Pan Hui,et al.  Distributed community detection in delay tolerant networks , 2007, MobiArch '07.

[27]  David P. Woodruff,et al.  Communication-Optimal Distributed Clustering , 2016, NIPS.

[28]  Luca Trevisan,et al.  Partitioning into Expanders , 2014, SODA.