Speeding Up BigClam Implementation on SNAP

We perform a detailed analysis of the C++ implementation of the Cluster Affiliation Model for Big Networks (BigClam) on the Stanford Network Analysis Project (SNAP). BigClam is a popular graph mining algorithm that is capable of finding overlapping communities in networks containing millions of nodes. Our analysis shows a key stage of the algorithm - determining if a node belongs to a community - dominates the runtime of the implementation, yet the computation is not parallelized. We show that by parallelizing computations across multiple threads using OpenMP we can speed up the algorithm by 5.3 times when solving large networks for communities, while preserving the integrity of the program and the result.

[1]  David Dominguez-Sal,et al.  Distributed Community Detection with the WCC Metric , 2014, WWW.

[2]  Lei Cao,et al.  Pivot-Based Distributed K-Nearest Neighbor Mining , 2017, ECML/PKDD.

[3]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[4]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[5]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[6]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Giovanni Montana,et al.  Community detection in multiplex networks using Locally Adaptive Random walks , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[9]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[10]  Andrew Pavlo,et al.  An Empirical Evaluation of In-Memory Multi-Version Concurrency Control , 2017, Proc. VLDB Endow..

[11]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[13]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[14]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[15]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[16]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[17]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[18]  Pushmeet Kohli,et al.  Personality and patterns of Facebook usage , 2012, WebSci '12.

[19]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[20]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[22]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[23]  T.S.Evans,et al.  Line graphs of weighted networks for overlapping communities , 2009, 0912.4389.

[24]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[25]  Marc Peter Deisenroth,et al.  Probabilistic Inference of Twitter Users' Age Based on What They Follow , 2016, ECML/PKDD.

[26]  Martin C. Rinard,et al.  Synchronization transformations for parallel computing , 1999, POPL '97.

[27]  Meng Wang,et al.  Community Detection in Social Networks: An In-depth Benchmarking Study with a Procedure-Oriented Framework , 2015, Proc. VLDB Endow..

[28]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[29]  Ângelo Cardoso,et al.  Generalising Random Forest Parameter Optimisation to Include Stability and Cost , 2017, ECML/PKDD.

[30]  David Lusseau,et al.  The emergent properties of a dolphin social network , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[31]  Andrei Z. Broder,et al.  Anatomy of the long tail: ordinary people with extraordinary tastes , 2010, WSDM '10.

[32]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[33]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[35]  Alex Pentland,et al.  Detecting Anomalous Behaviors Using Structural Properties of Social Networks , 2013, SBP.

[36]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[37]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[38]  Alex Pothen,et al.  Graph Partitioning Algorithms with Applications to Scientific Computing , 1997 .

[39]  Michael R. Lyu,et al.  Incorporating Implicit Link Preference Into Overlapping Community Detection , 2015, AAAI.