Distributed Graph Clustering by Load Balancing

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. However, most of these methods are based on complicated spectral techniques or convex optimisation, and cannot be applied directly for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest, and has wide applications for processing big datasets. In this paper we present a simple and distributed algorithm for graph clustering: for a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds, and recovers a partition of the graph close to an optimal partition. The main component of our algorithm is an application of the random matching model of load balancing, which is a fundamental protocol in distributed computing and has been extensively studied in the past 20 years. Hence, our result highlights an intrinsic and interesting connection between graph clustering and load balancing. At a technical level, we present a purely algebraic result characterising the early behaviours of load balancing processes for graphs exhibiting a cluster-structure. We believe that this result can be further applied to analyse other gossip processes, such as rumour spreading and averaging processes.

[1]  Silvio Lattanzi,et al.  Rumour spreading and graph conductance , 2010, SODA '10.

[2]  Thomas Sauerwald,et al.  Near-perfect load balancing by randomized rounding , 2009, STOC '09.

[3]  Thomas Sauerwald,et al.  Tight Bounds for Randomized Load Balancing on Arbitrary Network Topologies , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[4]  Richard M. Karp,et al.  Load balancing in dynamic structured peer-to-peer systems , 2006, Perform. Evaluation.

[5]  Thomas Sauerwald,et al.  Discrete load balancing is (almost) as easy as continuous load balancing , 2010, PODC.

[6]  Huan Xu,et al.  A Divide and Conquer Framework for Distributed Graph Clustering , 2015, ICML.

[7]  George Giakkoupis,et al.  Rumor spreading and vertex expansion , 2012, SODA.

[8]  Luca Trevisan,et al.  Find Your Place: Simple Distributed Algorithms for Community Detection , 2015, SODA.

[9]  S. Muthukrishnan,et al.  First and second order diffusive methods for rapid, coarse, distributed load balancing (extended abstract) , 1996, SPAA '96.

[10]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[11]  Luca Trevisan,et al.  Multi-way spectral partitioning and higher-order cheeger inequalities , 2011, STOC '12.

[12]  Luca Trevisan,et al.  Partitioning into Expanders , 2014, SODA.

[13]  Thomas Sauerwald,et al.  Quasirandom load balancing , 2010, SODA '10.

[14]  Silvio Lattanzi,et al.  Almost tight bounds for rumour spreading with conductance , 2010, STOC '10.

[15]  Noga Alon,et al.  Many random walks are faster than one , 2007, SPAA '08.

[16]  Thomas Sauerwald,et al.  Tight bounds for the cover time of multiple random walks , 2009, Theor. Comput. Sci..

[17]  He Sun,et al.  Gossip vs. Markov Chains, and Randomness-Efficient Rumor Spreading , 2013, SODA.

[18]  Shang-Hua Teng,et al.  A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning , 2008, SIAM J. Comput..

[19]  Gurmeet Singh Manku,et al.  Balanced binary trees for ID management and load balance in distributed hash tables , 2004, PODC '04.

[20]  George Giakkoupis,et al.  Tight bounds for rumor spreading in graphs of a given conductance , 2011, STACS.

[21]  Silvio Lattanzi,et al.  A Local Algorithm for Finding Well-Connected Clusters , 2013, ICML.

[22]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[23]  George Cybenko,et al.  Dynamic Load Balancing for Distributed Memory Multiprocessors , 1989, J. Parallel Distributed Comput..

[24]  Thomas Sauerwald,et al.  Randomized Rumour Spreading: The Effect of the Network Topology , 2014, Combinatorics, Probability and Computing.

[25]  Changjun Jiang,et al.  A fast adaptive load balancing method for parallel particle-based simulations , 2009, Simul. Model. Pract. Theory.

[26]  Alan M. Frieze,et al.  Multiple Random Walks in Random Regular Graphs , 2009, SIAM J. Discret. Math..

[27]  S. Muthukrishnan,et al.  First- and Second-Order Diffusive Methods for Rapid, Coarse, Distributed Load Balancing , 1996, Theory of Computing Systems.

[28]  P. Berenbrink,et al.  Randomized diffusion for indivisible loads , 2011, SODA 2011.

[29]  Pan Hui,et al.  Distributed community detection in delay tolerant networks , 2007, MobiArch '07.

[30]  David P. Woodruff,et al.  Communication-Optimal Distributed Clustering , 2016, NIPS.

[31]  Devavrat Shah,et al.  Gossip Algorithms , 2009, Found. Trends Netw..

[32]  Luca Trevisan,et al.  Approximating the Expansion Profile and Almost Optimal Local Graph Clustering , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[33]  Richard Peng,et al.  Partitioning Well-Clustered Graphs: Spectral Clustering Works! , 2014, SIAM J. Comput..

[34]  Yuval Rabani,et al.  Local divergence of Markov chains and the analysis of iterative load-balancing schemes , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[35]  David Kempe,et al.  A decentralized algorithm for spectral analysis , 2008, J. Comput. Syst. Sci..