EgoLP: Fast and Distributed Community Detection in Billion-Node Social Networks

Community structure is one of the most important and characteristic features of social networks. Numerous methods for discovering implicit user communities from a social graph of users have been proposed in recent years. However, most of them have performance and scalability issues which make them hardly applicable to population-wide analysis of modern social networks (billions of users and growing). In this paper we present EgoLP - an efficient and fully distributed method for social community detection. The method is based on propagating community labels through the network with the help of friendship groups of individual users. Experimental evaluation of Apache Spark implementation of the method showed that it outperforms some state-of-the-art methods in terms of a) similarity of extracted communities to the reference ones from synthetic networks, b) precision of user attributes prediction in Facebook based solely on community memberships, c) likelihood of the discovered community structure according to the proposed generative model. At the same time, the method retains near-linear complexity in the number of edges and is thus applicable to social graphs of up to 109 users.

[1]  Morten Mørup,et al.  Bayesian Community Detection , 2012, Neural Computation.

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Santo Fortunato,et al.  Consensus clustering in complex networks , 2012, Scientific Reports.

[5]  Fergal Reid,et al.  Detecting highly overlapping community structure by greedy clique expansion , 2010, KDD 2010.

[6]  F. Radicchi,et al.  Statistical significance of communities in networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[8]  Thore Graepel,et al.  WWW 2009 MADRID! Track: Data Mining / Session: Statistical Methods Matchbox: Large Scale Online Bayesian Recommendations , 2022 .

[9]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[10]  Dino Pedreschi,et al.  DEMON: a local-first discovery method for overlapping communities , 2012, KDD.

[11]  Bradley S. Rees,et al.  EgoClustering: Overlapping Community Detection via Merged Friendship-Groups , 2013, The Influence of Technology on Social Network Analysis and Mining.

[12]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[13]  Guillaume Chelius,et al.  Triangles to Capture Social Cohesion , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[14]  Jianyong Wang,et al.  Parallel community detection on large networks with propinquity dynamics , 2009, KDD.

[15]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Xiaoming Liu,et al.  SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[17]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[18]  Josep-Lluís Larriba-Pey,et al.  High quality, scalable and parallel community detection for large real graphs , 2014, WWW.

[19]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[20]  Padraig Cunningham,et al.  Benchmarking community detection methods on social media data , 2013, ArXiv.

[21]  Mark E. J. Newman,et al.  Spectral methods for network community detection and graph partitioning , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Denis Turdakov,et al.  Distributed Generation of Billion-node Social Graphs with Overlapping Community Structure , 2014, CompleNet.

[23]  Deborah A. Prentice,et al.  Asymmetries in Attachments to Groups and to their Members: Distinguishing between Common-Identity and Common-Bond Groups , 1994 .

[24]  Tansel Özyer,et al.  The Influence of Technology on Social Network Analysis and Mining , 2013, Lecture Notes in Social Networks.

[25]  N. Buzun,et al.  Innovative Methods and Measures in Overlapping Community Detection , 2012 .

[26]  Zheng Chen,et al.  Latent semantic analysis for multiple-type interrelated data objects , 2006, SIGIR.

[27]  Bradley S. Rees,et al.  Detecting Overlapping Communities in Complex Networks Using Swarm Intelligence for Multi-threaded Label Propagation , 2012, CompleNet.

[28]  Boleslaw K. Szymanski,et al.  Parallel Overlapping Community Detection with SLPA , 2013, 2013 International Conference on Social Computing.

[29]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[30]  Huaiyu Wan,et al.  Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks , 2012, Journal of Computer Science and Technology.

[31]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[32]  T. S. Evans,et al.  Complex networks , 2004 .

[33]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[34]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[35]  John E. Hopcroft,et al.  Use of Local Group Information to Identify Communities in Networks , 2015, ACM Trans. Knowl. Discov. Data.

[36]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.