Randomized Greedy Modularity Optimization for Group Detection in Huge Social Networks

Due to the increasing availability of very large data sets of social networks, there is a need for scalable algorithms that are able to analyze these networks with reasonable resource requirements. Finding ’natural groups’ in these networks has gotten much attention lately. We present an algorithm that detects communities by optimizing the modularity, which is a measure for the quality of a given decomposition of a network into clusters. As the calculation of the clustering with maximal modularity is NP-hard, various heuristic algorithms have been proposed. Still, the time or memory complexity of those algorithms does not allow finding communities in huge networks. In this article we present a memory-efficient randomized greedy algorithm to gain a speed-up compared to the state-of-the-art while the achieved cluster quality remains very high. The algorithm is compared to the previously best algorithms on 9 publicly available data sets and shown to be of comparable performance while being faster and using considerably less memory.

[1]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[2]  Per Bak,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness, by Duncan J. Watts , 2000 .

[3]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[4]  F. Harary,et al.  Eccentricity and centrality in networks , 1995 .

[5]  Mingyuan An,et al.  A microscopic view on community detection in complex networks , 2008, PIKM '08.

[6]  Hristo Djidjev,et al.  A Scalable Multilevel Algorithm for Graph Clustering and Community Structure Detection , 2007, WAW.

[7]  L. Verbrugge The Structure of Adult Friendship Choices , 1977 .

[8]  Jure Leskovec,et al.  Planetary-scale views on a large instant-messaging network , 2008, WWW.

[9]  Eric Gilbert,et al.  Predicting tie strength with social media , 2009, CHI.

[10]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[11]  Nagiza F. Samatova,et al.  A scalable, parallel algorithm for maximal clique enumeration , 2009, J. Parallel Distributed Comput..

[12]  Hongyuan Zha,et al.  A new Mallows distance based metric for comparing clusterings , 2005, ICML '05.

[13]  Malik Magdon-Ismail,et al.  Finding communities by clustering a graph into overlapping subgraphs , 2005, IADIS AC.

[14]  T. Murata,et al.  Advanced modularity-specialized label propagation algorithm for detecting communities in networks , 2009, 0910.1154.

[15]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[16]  Alain Guénoche,et al.  Comparison of Distance Indices Between Partitions , 2006, Data Science and Classification.

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  David Eppstein,et al.  Fast approximation of centrality , 2000, SODA '01.

[20]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Falk Schreiber,et al.  Comparison of Centralities for Biological Networks , 2004, German Conference on Bioinformatics.

[22]  Mao-Bin Hu,et al.  Detect overlapping and hierarchical community structure in networks , 2008, ArXiv.

[23]  D. West Introduction to Graph Theory , 1995 .

[24]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jiye Yu,et al.  iLink: search and routing in social networks , 2007, KDD '07.

[27]  Steve Gregory,et al.  A Fast Algorithm to Find Overlapping Communities in Networks , 2008, ECML/PKDD.

[28]  V. Carchiolo,et al.  Extending the definition of modularity to directed graphs with overlapping communities , 2008, 0801.1647.

[29]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  A. Clauset Finding local community structure in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Robert E. Tarjan,et al.  Clustering Social Networks , 2007, WAW.

[32]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[33]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[34]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[35]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[36]  D. Hoyt,et al.  Adult Kinship Networks: The Selective Formation of Intimate Ties with Kin , 1983 .

[37]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[38]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[39]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[40]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[42]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[43]  Javier Béjar,et al.  Clustering algorithm for determining community structure in large networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[45]  S. Dongen A cluster algorithm for graphs , 2000 .

[46]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[47]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[48]  Steve Gregory,et al.  Detecting communities in networks by merging cliques , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[49]  D. Watts,et al.  Origins of Homophily in an Evolving Social Network1 , 2009, American Journal of Sociology.

[50]  Ulrik Brandes,et al.  On Finding Graph Clusterings with Maximum Modularity , 2007, WG.

[51]  Matthew Richardson,et al.  Yes, there is a correlation: - from social networks to personal behavior on the web , 2008, WWW.

[52]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[54]  Lada A. Adamic,et al.  How to search a social network , 2005, Soc. Networks.

[55]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[56]  A. Barabasi,et al.  Analysis of a large-scale weighted network of one-to-one human communication , 2007, physics/0702158.

[57]  Charles H. Hubbell An Input-Output Approach to Clique Identification , 1965 .

[58]  Rajeev Motwani,et al.  What can you do with a Web in your Pocket? , 1998, IEEE Data Eng. Bull..

[59]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[60]  N. Lin Foundations of social research , 1976 .

[61]  Stephen P. Borgatti,et al.  Centrality and network flow , 2005, Soc. Networks.

[62]  Damon Horowitz,et al.  The anatomy of a large-scale social search engine , 2010, WWW '10.

[63]  Sebastiano Vigna,et al.  PageRank as a function of the damping factor , 2005, WWW '05.

[64]  Mohammed J. Zaki,et al.  Finding Hidden Group Structure in a Stream of Communications , 2006, ISI.

[65]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[66]  P. Bonacich Factoring and weighting approaches to status scores and clique identification , 1972 .

[67]  Michalis Vazirgiannis,et al.  Web path recommendations based on page ranking and Markov models , 2005, WIDM '05.

[68]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[69]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[70]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[72]  Jie Wu,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2003 .

[73]  Malik Magdon-Ismail,et al.  Efficient Identification of Overlapping Communities , 2005, ISI.

[74]  Michalis Vazirgiannis,et al.  Usage-based PageRank for Web personalization , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[75]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[76]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[77]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[78]  William A. Wallace,et al.  Communication Dynamics of Blog Networks , 2008, SNAKDD.

[79]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[80]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[81]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[82]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[84]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[85]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[86]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[87]  Lars Kai Hansen,et al.  Deterministic modularity optimization , 2007 .

[88]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[89]  Thomas W. Valente,et al.  The stability of centrality measures when networks are sampled , 2003, Soc. Networks.

[90]  P. Lazarsfeld,et al.  Friendship as Social process: a substantive and methodological analysis , 1964 .

[91]  A Díaz-Guilera,et al.  Self-similar community structure in a network of human interactions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[92]  Li Ma,et al.  Scalable Community Discovery of Large Networks , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[93]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[94]  A. Medus,et al.  Detection of community structures in networks via global optimization , 2005 .

[95]  Guojun Gan,et al.  The k-means Algorithm , 2011 .

[96]  Ferenc Bodon,et al.  A fast APRIORI implementation , 2003, FIMI.

[97]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[98]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[99]  John Kelly and Bruce Etling Mapping Iran's online public: Politics and culture in the Persian blogosphere , 2008 .

[100]  Bernardo A. Huberman,et al.  Email as spectroscopy: automated discovery of community structure within organizations , 2003 .

[101]  Ying Zhou,et al.  Community discovery and analysis in blogspace , 2006, WWW '06.

[102]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[103]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[104]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[105]  S. L. Hakimi,et al.  Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph , 1964 .

[106]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[107]  Yun Chi,et al.  Blog Community Discovery and Evolution Based on Mutual Awareness Expansion , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[108]  David Kempe,et al.  Modularity-maximizing graph communities via mathematical programming , 2007, 0710.2533.

[109]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[110]  Phillip Bonacich,et al.  Eigenvector-like measures of centrality for asymmetric relations , 2001, Soc. Networks.

[111]  A. Arenas,et al.  Models of social networks based on social distance attachment. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[112]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[113]  Kathleen M. Carley,et al.  Clearing the FOG: Fuzzy, overlapping groups for social networks , 2008, Soc. Networks.

[114]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[115]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[116]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[117]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[118]  Amedeo Caflisch,et al.  Multistep greedy algorithm identifies community structure in real-world and computer-generated networks , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[119]  Gert Sabidussi,et al.  The centrality index of a graph , 1966 .

[120]  Jennifer Neville,et al.  Modeling relationship strength in online social networks , 2010, WWW '10.

[121]  D. Watts,et al.  An Experimental Study of Search in Global Social Networks , 2003, Science.