Uncovering Local Hierarchical Overlapping Communities at Scale

Real-life systems involving interacting objects are typically modeled as graphs and can often grow very large in size. Revealing the community structure of such systems is crucial in helping us better understand their complex nature. However, the ever-increasing size of real-world graphs and our evolving perception of what a community is, make the task of community detection very challenging. A critical relevant challenge is the discovery of the possibly overlapping communities of a given node in a billion-node graph. This problem is very common in modern large social networks like Facebook and LinkedIn. In this work, we propose a scalable local community detection approach to efficiently unfold the communities of individual target nodes in a given network. Our goal is to reveal the clusters formed around nodes (e.g., users) by leveraging the relations within all different contexts these nodes participate in. Our approach, termed Local Dispersion-aware Link Communities or LDLC, considers the similarity of pairs of links in the graph as well as the extent of their participation in multiple contexts. Then, we determine the order in which we should group the pairs of links so that we form meaningful hierarchical communities. We are not affected by constraints existing in previous techniques such as the need for several seed nodes or the need to collapse multiple overlapping communities to a single community. Our experimental evaluation using ground-truth communities for a wide range of large real-world networks shows that our LDLC algorithm significantly outperforms state-of-the-art methods on both accuracy and efficiency. Moreover, we show that LDLC uncovers very effectively the hierarchical structure of overlapping communities by producing detailed dendrograms.

[1]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[2]  Alex Delis,et al.  Realizing Memory-Optimized Distributed Graph Processing , 2018, IEEE Transactions on Knowledge and Data Engineering.

[3]  R. Lambiotte,et al.  Line graphs, link partitions, and overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[5]  Kun He,et al.  Detecting Overlapping Communities from Local Spectral Subspaces , 2015, 2015 IEEE International Conference on Data Mining.

[6]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[7]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[8]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[9]  David F. Gleich,et al.  Mining Large Graphs , 2016, Handbook of Big Data.

[10]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[12]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[13]  Alex Delis,et al.  COEUS: Community detection via seed-set expansion on graph streams , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[14]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.

[15]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[16]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[17]  Alex Delis,et al.  Memory-Optimized Distributed Graph Processing through Novel Compression Techniques , 2016, CIKM.

[18]  Dino Pedreschi,et al.  DEMON: a local-first discovery method for overlapping communities , 2012, KDD.

[19]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[20]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[21]  Santo Fortunato,et al.  Community detection in networks: Structural communities versus ground truth , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Jon M. Kleinberg,et al.  Romantic partnerships and the dispersion of social ties: a network analysis of relationship status on facebook , 2013, CSCW.

[23]  Michael Sioutis,et al.  Pushing the Envelope in Graph Compression , 2014, CIKM.

[24]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[25]  Jon M. Kleinberg,et al.  Community membership identification from small seed sets , 2014, KDD.

[26]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[27]  Alex Delis,et al.  Scalable link community detection: A local dispersion-aware approach , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[28]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[31]  M. Kochen,et al.  Contacts and influence , 1978 .

[32]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[33]  P. V. Marsden,et al.  Measuring Tie Strength , 1984 .

[34]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[35]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[36]  S. Feld The Focused Organization of Social Ties , 1981, American Journal of Sociology.

[37]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[38]  Alex Delis,et al.  DiCeS: Detecting Communities in Network Streams over the Cloud , 2019, 2019 IEEE 12th International Conference on Cloud Computing (CLOUD).

[39]  Jure Leskovec,et al.  Structure and Overlaps of Ground-Truth Communities in Networks , 2014, TIST.

[40]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[41]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[42]  Christos Faloutsos,et al.  Scalable community discovery from multi-faceted graphs , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[43]  Kun He,et al.  Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach , 2015, WWW.

[44]  Diane Felmlee,et al.  No Couple Is an Island: A Social Network Perspective on Dyadic Stability , 2001 .

[45]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[46]  Jure Leskovec,et al.  Overlapping Communities Explain Core–Periphery Organization of Networks , 2014, Proceedings of the IEEE.

[47]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.