Node Attribute-enhanced Community Detection in Complex Networks

Community detection involves grouping the nodes of a network such that nodes in the same community are more densely connected to each other than to the rest of the network. Previous studies have focused mainly on identifying communities in networks using node connectivity. However, each node in a network may be associated with many attributes. Identifying communities in networks combining node attributes has become increasingly popular in recent years. Most existing methods operate on networks with attributes of binary, categorical, or numerical type only. In this study, we introduce kNN-enhance, a simple and flexible community detection approach that uses node attribute enhancement. This approach adds the k Nearest Neighbor (kNN) graph of node attributes to alleviate the sparsity and the noise effect of an original network, thereby strengthening the community structure in the network. We use two testing algorithms, kNN-nearest and kNN-Kmeans, to partition the newly generated, attribute-enhanced graph. Our analyses of synthetic and real world networks have shown that the proposed algorithms achieve better performance compared to existing state-of-the-art algorithms. Further, the algorithms are able to deal with networks containing different combinations of binary, categorical, or numerical attributes and could be easily extended to the analysis of massive networks.

[1]  Buzhou Tang,et al.  Network structure exploration in networks with node attributes , 2016 .

[2]  Jean-Loup Guillaume,et al.  Local leaders in random networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[4]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[6]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[7]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[8]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[9]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[10]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[12]  C J Stam,et al.  The trees and the forest: Characterization of complex brain networks with minimum spanning trees. , 2014, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[13]  P. V. Marsden,et al.  Homogeneity in confiding relations , 1988 .

[14]  Zhihua Zhang,et al.  Generalized Latent Factor Models for Social Network Analysis , 2011, IJCAI.

[15]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[16]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[17]  Kaizhu Huang,et al.  Fast kNN Graph Construction with Locality Sensitive Hashing , 2013, ECML/PKDD.

[18]  James P. Bagrow,et al.  Communities and bottlenecks: trees and treelike networks have high modularity. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Blair D. Sullivan,et al.  Tree-Like Structure in Large Social and Information Networks , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[21]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Jian Yu,et al.  Combining a popularity-productivity stochastic block model with a discriminative-content model for general structure detection. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[24]  R. Carter 11 – IT and society , 1991 .

[25]  Yihong Gong,et al.  Directed Network Community Detection: A Popularity and Productivity Link Model , 2010, SDM.

[26]  Christos Faloutsos,et al.  PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs , 2012, SDM.

[27]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[28]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[29]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[30]  Osmar R. Zaïane,et al.  Generating Attributed Networks with Communities , 2015, PloS one.

[31]  Hong Cheng,et al.  Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities , 2011, TKDD.

[32]  Christophe Ambroise,et al.  Clustering based on random graph model embedding vertex features , 2009, Pattern Recognit. Lett..

[33]  Jari Saramäki,et al.  Characterizing the Community Structure of Complex Networks , 2010, PloS one.

[34]  Z. Di,et al.  Community detection by signaling on complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[36]  Peter D. Hoff,et al.  Multiplicative latent factor models for description and prediction of social networks , 2009, Comput. Math. Organ. Theory.

[37]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[38]  Feodor F. Dragan,et al.  Metric tree‐like structures in real‐world networks: an empirical study , 2016, Networks.

[39]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[41]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[42]  Jian Yu,et al.  A parameter-free community detection method based on centrality and dispersion of nodes in complex networks , 2015 .

[43]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[44]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[45]  Zhao Yang,et al.  A Comparative Analysis of Community Detection Algorithms on Artificial Networks , 2016, Scientific Reports.

[46]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[47]  S. Strogatz Exploring complex networks , 2001, Nature.

[48]  Laurent Viennot,et al.  Asymptotic Modularity of Some Graph Classes , 2011, ISAAC.

[49]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[50]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[51]  Claudio Castellano,et al.  Community Structure in Graphs , 2007, Encyclopedia of Complexity and Systems Science.

[52]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[53]  Hong Cheng,et al.  GBAGC: A General Bayesian Framework for Attributed Graph Clustering , 2014, TKDD.

[54]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.