Detecting Clusters/Communities in Social Networks

ABSTRACT Cohen's κ, a similarity measure for categorical data, has since been applied to problems in the data mining field such as cluster analysis and network link prediction. In this paper, a new application is examined: community detection in networks. A new algorithm is proposed that uses Cohen's κ as a similarity measure for each pair of nodes; subsequently, the κ values are then clustered to detect the communities. This paper defines and tests this method on a variety of simulated and real networks. The results are compared with those from eight other community detection algorithms. Results show this new algorithm is consistently among the top performers in classifying data points both on simulated and real networks. Additionally, this is one of the broadest comparative simulations for comparing community detection algorithms to date.

[1]  Michael J. Brusco,et al.  Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques , 2007, J. Classif..

[2]  M. Brusco,et al.  A variable neighborhood search method for generalized blockmodeling of two-mode binary matrices , 2007 .

[3]  P. Arabie,et al.  Mapclus: A mathematical programming approach to fitting the adclus model , 1980 .

[4]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[5]  Abhishek Srivastava,et al.  Motif Analysis in the Amazon Product Co-Purchasing Network , 2010, ArXiv.

[6]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[7]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[9]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[12]  Michael J. Brusco,et al.  Clusterwise p* models for social network analysis , 2011, Stat. Anal. Data Min..

[13]  M. Cugmas,et al.  On comparing partitions , 2015 .

[14]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[15]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Michael J. Brusco,et al.  A note on using the adjusted Rand index for link prediction in networks , 2015, Soc. Networks.

[17]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Lawrence Hubert,et al.  The variance of the adjusted Rand index. , 2016, Psychological methods.

[19]  Talma Hendler,et al.  Dependency Network Analysis (DEPNA) Reveals Context Related Influence of Brain Network Nodes , 2016, Scientific Reports.

[20]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[21]  Patrick Doreian,et al.  A variable neighborhood search method for a two-mode blockmodeling problem in social network analysis , 2013, Network Science.

[22]  M. Brusco,et al.  Evaluating mixture modeling for clustering: recommendations and cautions. , 2011, Psychological methods.

[23]  Michael J. Brusco,et al.  A Note on Maximizing the Agreement Between Partitions: A Stepwise Optimal Algorithm and Some Properties , 2015, J. Classif..

[24]  John W. Sheppard,et al.  The Information Flow Model , 1994 .

[25]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[26]  Douglas Steinley,et al.  Local optima in K-means clustering: what you don't know may hurt you. , 2003, Psychological methods.

[27]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[28]  D. Steinley Profiling local optima in K-means clustering: developing a diagnostic technique. , 2006, Psychological methods.

[29]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Tzur M. Karelitz,et al.  The Effect of the Raters' Marginal Distributions on Their Matched Agreement: A Rescaling Framework for Interpreting Kappa , 2013, Multivariate behavioral research.

[31]  James E. Corter,et al.  A graph-theoretic method for organizing overlapping clusters into trees, multiple trees, or extended trees , 1995 .

[32]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[33]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[34]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[35]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[36]  Douglas Steinley,et al.  Local Optima in Mixture Modeling , 2016, Multivariate behavioral research.

[37]  M. Brusco,et al.  A Tabu-Search Heuristic for Deterministic Two-Mode Blockmodeling of Binary Network Matrices , 2011, Psychometrika.

[38]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[39]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[40]  Edward T. Bullmore,et al.  The discovery of population differences in network community structure: New methods and applications to brain functional networks in schizophrenia , 2012, NeuroImage.

[41]  M. Brusco,et al.  Inducing a blockmodel structure of two-mode binary data using seriation procedures , 2006 .

[42]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  M. Brusco,et al.  Integer Programs for One- and Two-Mode Blockmodeling Based on Prespecified Image Matrices for Structural and Regular Equivalence. , 2009, Journal of mathematical psychology.

[44]  Claudia D. van Borkulo,et al.  A new method for constructing networks from binary data , 2014, Scientific Reports.

[45]  Rik Sarkar,et al.  Community Detection , 2014, Encyclopedia of Machine Learning and Data Mining.

[46]  Douglas Steinley,et al.  Stability analysis in K-means clustering. , 2008, The British journal of mathematical and statistical psychology.

[47]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[48]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Matteo Giletta,et al.  Ethnic differences in associations among popularity, likability, and trajectories of adolescents' alcohol use and frequency. , 2015, Child development.

[50]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[51]  Vladimir Batagelj,et al.  Generalized blockmodeling , 2005, Structural analysis in the social sciences.