Community Discovery: Simple and Scalable Approaches

The increasing size and complexity of online social networks have brought distinct challenges to the task of community discovery. A community discovery algorithm needs to be efficient, not taking a prohibitive amount of time to finish. The algorithm should also be scalable, capable of handling large networks containing billions of edges or even more. Furthermore, a community discovery algorithm should be effective in that it produces community assignments of high quality. In this chapter, we present a selection of algorithms that follow simple design principles, and have proven highly effective and efficient according to extensive empirical evaluations. We start by discussing a generic approach of community discovery by combining multilevel graph contraction with core clustering algorithms. Next we describe the usage of network sampling in community discovery, where the goal is to reduce the number of nodes and/or edges while retaining the network’s underlying community structure. Finally, we review research efforts that leverage various parallel and distributed computing paradigms in community discovery, which can facilitate finding communities in tera- and peta-scale networks.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[3]  Patrick Doreian,et al.  Partitioning signed social networks , 2009, Soc. Networks.

[4]  Vipin Kumar,et al.  Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs , 1999, SIAM Rev..

[5]  Michael Ovelgönne,et al.  Distributed community detection in web-scale networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[6]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[7]  Nicholas A. Hamilton,et al.  Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Srinivasan Parthasarathy,et al.  Community Discovery in Social Networks: Applications, Methods and Emerging Trends , 2011, Social Network Data Analytics.

[10]  Jiming Liu,et al.  Community Mining from Signed Social Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[12]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Hans-Peter Kriegel,et al.  Metropolis Algorithms for Representative Subgraph Sampling , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[15]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[17]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[18]  Philip S. Yu,et al.  On Clustering Graph Streams , 2010, SDM.

[19]  Steven J. Plimpton,et al.  Parallel Algorithms for Dynamically Partitioning Unstructured Grids , 1995, PPSC.

[20]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[21]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[22]  Michael Luby A Simple Parallel Algorithm for the Maximal Independent Set Problem , 1986, SIAM J. Comput..

[23]  Curt Jones,et al.  A Heuristic for Reducing Fill-In in Sparse Matrix Factorization , 1993, PPSC.

[24]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[25]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[26]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[27]  Christian Staudt,et al.  Engineering High-Performance Community Detection Heuristics for Massive Graphs , 2013, 2013 42nd International Conference on Parallel Processing.

[28]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[29]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[30]  Pietro Liò,et al.  Towards real-time community detection in large networks. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[32]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[33]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  S. M. Faisal,et al.  A fast implementation of MLR-MCL algorithm on multi-core processors , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[35]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[36]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[37]  Alan M. Frieze,et al.  Min-wise independent permutations (extended abstract) , 1998, STOC '98.

[38]  Sach Mukherjee,et al.  Network clustering: probing biological heterogeneity by sparse graphical models , 2011, Bioinform..

[39]  Josep-Lluís Larriba-Pey,et al.  Shaping communities out of triangles , 2012, CIKM.

[40]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[41]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[42]  Yossi Richter,et al.  Predicting Customer Churn in Mobile Networks through Analysis of Social Groups , 2010, SDM.

[43]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[44]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[45]  Yiannis Kompatsiaris,et al.  Community detection in Social Media , 2012, Data Mining and Knowledge Discovery.

[46]  S. Dongen Graph clustering by flow simulation , 2000 .

[47]  Christos Faloutsos,et al.  HEigen: Spectral Analysis for Billion-Scale Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[48]  Josep-Lluís Larriba-Pey,et al.  High quality, scalable and parallel community detection for large real graphs , 2014, WWW.

[49]  A. Vázquez,et al.  Network clustering coefficient without degree-correlation biases. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[51]  David A. Bader,et al.  Parallel Community Detection for Massive Graphs , 2011, PPAM.

[52]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[53]  Steven Skiena,et al.  Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica ® , 2009 .

[54]  Michael T. Heath,et al.  Parallel Algorithms for Sparse Linear Systems , 1991, SIAM Rev..