Put Three and Three Together

Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its applications in many fields such as biology, social networks, or network traffic analysis. Although the existing metrics used to quantify the quality of a community work well in general, under some circumstances, they fail at correctly capturing such notion. The main reason is that these metrics consider the internal community edges as a set, but ignore how these actually connect the vertices of the community. We propose the Weighted Community Clustering (WCC), which is a new community metric that takes the triangle instead of the edge as the minimal structural motif indicating the presence of a strong relation in a graph. We theoretically analyse WCC in depth and formally prove, by means of a set of properties, that the maximization of WCC guarantees communities with cohesion and structure. In addition, we propose Scalable Community Detection (SCD), a community detection algorithm based on WCC, which is designed to be fast and scalable on SMP machines, showing experimentally that WCC correctly captures the concept of community in social networks using real datasets. Finally, using ground-truth data, we show that SCD provides better quality than the best disjoint community detection algorithms of the state of the art while performing faster.

[1]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[2]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  James P. Bagrow,et al.  Communities and bottlenecks: trees and treelike networks have high modularity. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[5]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[6]  Filippo Radicchi,et al.  A paradox in community detection , 2013, ArXiv.

[7]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Jure Leskovec,et al.  Overlapping Communities Explain Core–Periphery Organization of Networks , 2014, Proceedings of the IEEE.

[9]  Lada A. Adamic,et al.  Networks of strong ties , 2006, cond-mat/0605279.

[10]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[12]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[13]  Josep-Lluís Larriba-Pey,et al.  Social Based Layouts for the Increase of Locality in Graph Operations , 2011, DASFAA.

[14]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  James P. Bagrow Are communities just bottlenecks? Trees and treelike networks have high modularity , 2012, ArXiv.

[16]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[17]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[18]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[19]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[20]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[21]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[23]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[24]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[25]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[26]  A. Medus,et al.  Detection of community structures in networks via global optimization , 2005 .

[27]  Josep-Lluís Larriba-Pey,et al.  High quality, scalable and parallel community detection for large real graphs , 2014, WWW.

[28]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[29]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[30]  Jian Pei,et al.  MobileMiner: a real world case study of data mining in mobile communication , 2009, SIGMOD Conference.

[31]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Emilio Di Giacomo,et al.  Graph Visualization Techniques for Web Clustering Engines , 2007, IEEE Transactions on Visualization and Computer Graphics.

[34]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[35]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[36]  Victor Muntés-Mulero,et al.  Overlapping Community Search for social networks , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[37]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[39]  Josep-Lluís Larriba-Pey,et al.  Massive Query Expansion by Exploiting Graph Knowledge Bases for Image Retrieval , 2014, ICMR.