Clustering attributed graphs: Models, measures and methods

Clustering a graph, i.e., assigning its nodes to groups, is an important operation whose best known application is the discovery of communities in social networks. Graph clustering and community detection have traditionally focused on graphs without attributes, with the notable exception of edge weights. However, these models only provide a partial representation of real social systems, that are thus often described using node attributes, representing features of the actors, and edge attributes, representing different kinds of relationships among them. We refer to these models as attributed graphs. Consequently, existing graph clustering methods have been recently extended to deal with node and edge attributes. This article is a literature survey on this topic, organizing, and presenting recent research results in a uniform way, characterizing the main existing clustering methods and highlighting their conceptual differences. We also cover the important topic of clustering evaluation and identify current open problems.

[1]  François Poulet,et al.  Entropy based community detection in augmented social networks , 2011, 2011 International Conference on Computational Aspects of Social Networks (CASoN).

[2]  Sun-Ki Chai,et al.  Social Computing, Behavioral-Cultural Modeling and Prediction , 2014, Lecture Notes in Computer Science.

[3]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[4]  Jiawei Han,et al.  LINKREC: a unified framework for link recommendation with user attributes and graph structure , 2010, WWW '10.

[5]  Huan Liu,et al.  Social Computing, Behavioral Modeling, and Prediction , 2008 .

[6]  Nathalie Villa-Vialaneix,et al.  Carte auto-organisatrice pour graphes étiquetés. , 2013 .

[7]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Linton C. Freeman,et al.  Cliques, Galois lattices, and the structure of human social groups☆ , 1996 .

[9]  Micah Adler,et al.  Clustering Relational Data Using Attribute and Link Information , 2003 .

[10]  Rong Ge,et al.  Joint cluster analysis of attribute data and relationship data , 2008, ACM Trans. Knowl. Discov. Data.

[11]  Mason A. Porter,et al.  Multilayer networks , 2013, J. Complex Networks.

[12]  Matteo Magnani,et al.  Pareto Distance for Multi-layer Network Analysis , 2013, SBP.

[13]  Santo Fortunato,et al.  Community detection in networks: Structural communities versus ground truth , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Ulrik Brandes,et al.  Advances in Social Network Analysis and Mining , 2009 .

[15]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[16]  Jianyong Wang,et al.  Coherent closed quasi-clique discovery from large dense graph databases , 2006, KDD '06.

[17]  Benno Stein,et al.  On the Nature of Structure and Its Identification , 1999, WG.

[18]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[19]  Hong Cheng,et al.  Clustering Large Attributed Graph , 2012, J. Inf. Process..

[20]  Ali Pinar,et al.  Latent Clustering on Graphs with Multiple Edge Types , 2011, WAW.

[21]  Matteo Magnani,et al.  Formation of Multiple Networks , 2013, SBP.

[22]  Przemyslaw Kazienko,et al.  Shortest Path Discovery in the Multi-layered Social Network , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[23]  Katarzyna Musial,et al.  Multi-Layered Social Network Creation Based on Bibliographic Data , 2010, 2010 IEEE Second International Conference on Social Computing.

[24]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[25]  Ji-Rong Wen,et al.  Scalable community discovery on textual data with relations , 2008, CIKM '08.

[26]  E. Goffman Frame analysis: An essay on the organization of experience , 1974 .

[27]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[29]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[30]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[31]  Jiawei Han,et al.  Mining hidden community in heterogeneous social networks , 2005, LinkKDD '05.

[32]  François Poulet,et al.  Community detection and visualization in social networks , 2013, ACM Trans. Intell. Syst. Technol..

[33]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[34]  M. Cugmas,et al.  On comparing partitions , 2015 .

[35]  Jiawei Han,et al.  A Unified Framework for Link Recommendation Using Random Walks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[36]  V. Carchiolo,et al.  Extending the definition of modularity to directed graphs with overlapping communities , 2008, 0801.1647.

[37]  Alexander J. Smola,et al.  Like like alike: joint friendship and interest propagation in social networks , 2011, WWW.

[38]  Nitesh V. Chawla,et al.  Community Detection in a Large Real-World Social Network , 2008 .

[39]  Huan Liu,et al.  Community detection via heterogeneous interaction analysis , 2012, Data Mining and Knowledge Discovery.

[40]  Christos Faloutsos,et al.  PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs , 2012, SDM.

[41]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[42]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[43]  Anna Monreale,et al.  Foundations of Multidimensional Network Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[44]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[45]  Jiawei Han,et al.  Graph cube: on warehousing and OLAP multidimensional networks , 2011, SIGMOD '11.

[46]  Heiko Rieger,et al.  Random walks on complex networks. , 2004, Physical review letters.

[47]  Emmanuel Viennet,et al.  Community Detection based on Structural and Attribute Similarities , 2012, ICDS 2012.

[48]  Ulrik Brandes,et al.  Network Analysis: Methodological Foundations , 2010 .

[49]  Matteo Magnani,et al.  The ML-Model for Multi-layer Social Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[50]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[51]  David R. Karger,et al.  Global min-cuts in RNC, and other ramifications of a simple min-out algorithm , 1993, SODA '93.

[52]  Francesco Calabrese,et al.  ABACUS: frequent pAttern mining-BAsed Community discovery in mUltidimensional networkS , 2013, Data Mining and Knowledge Discovery.

[53]  Rong Ge,et al.  Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-Center Problem , 2006, SDM.

[54]  Mohammed J. Zaki,et al.  Structural correlation pattern mining for large graphs , 2010, MLG '10.

[55]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[56]  Francesco Calabrese,et al.  ABACUS: Apriori-BAsed Community discovery in mUltidimensional networkS , 2013, ArXiv.

[57]  Emmanuel Lazega,et al.  Multiplexity, generalized exchange and cooperation in organizations: a case study , 1999, Soc. Networks.

[58]  F. Agneessens,et al.  Reciprocity, Multiplexity, and Exchange: Measures , 2007 .

[59]  Jukka-Pekka Onnela,et al.  Community Structure in Time-Dependent, Multiscale, and Multiplex Networks , 2009, Science.

[60]  Ling Huang,et al.  Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN) , 2011, ArXiv.

[61]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[62]  Chris Arney Network Analysis: Methodological Foundations , 2012 .

[63]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[64]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[65]  Barbora Micenková,et al.  Combinatorial Analysis of Multiple Networks , 2013, ArXiv.

[66]  Thomas Seidl,et al.  Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors , 2013, PAKDD.

[67]  H. Stanley,et al.  Networks formed from interdependent networks , 2011, Nature Physics.

[68]  Martin Ester,et al.  Mining Cohesive Patterns from Graphs with Feature Vectors , 2009, SDM.

[69]  Cécile Bothorel,et al.  Information integration for detecting communities in attributed graphs , 2013, 2013 Fifth International Conference on Computational Aspects of Social Networks.

[70]  Michèle Sebag,et al.  Machine Learning and Knowledge Discovery in Databases , 2015, Lecture Notes in Computer Science.

[71]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[72]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[73]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[74]  Daniel J. Brass,et al.  Network Analysis in the Social Sciences , 2009, Science.

[75]  Kazuyuki Aihara,et al.  Epidemic spread in adaptive networks with multitype agents , 2011 .

[76]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[77]  R. Burt Applied Network Analysis , 1978 .

[78]  Christos Faloutsos,et al.  Fast direction-aware proximity for graph mining , 2007, KDD '07.

[79]  François Poulet,et al.  Détection et visualisation des communautés dans les réseaux sociaux , 2012, Rev. d'Intelligence Artif..

[80]  Wu-Jun Li,et al.  Relation regularized matrix factorization , 2009, IJCAI 2009.

[81]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[82]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[83]  Charalampos E. Tsourakakis,et al.  Chromatic Correlation Clustering , 2015, TKDD.

[84]  Fosca Giannotti,et al.  Finding and Characterizing Communities in Multidimensional Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[85]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[86]  Giulio Rossetti,et al.  Scalable Link Prediction on Multidimensional Networks , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[87]  Dino Pedreschi,et al.  A classification for community discovery methods in complex networks , 2011, Stat. Anal. Data Min..

[88]  Ali Pinar,et al.  On Clustering on Graphs with Multiple Edge Types , 2011, Internet Math..

[89]  Francesco Bonchi,et al.  Description-Driven Community Detection , 2014, TIST.

[90]  Hao Wang,et al.  Analysis of Large Multi-modal Social Networks: Patterns and a Generator , 2010, ECML/PKDD.

[91]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[92]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[93]  Nicola Barbieri,et al.  Cascade-based community detection , 2013, WSDM.

[94]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[95]  Jennifer Neville,et al.  Randomization tests for distinguishing social influence and homophily effects , 2010, WWW '10.

[96]  François Poulet,et al.  Semantic Clustering of Social Networks using Points of View , 2011, CORIA.

[97]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[98]  Mathias Géry,et al.  Combining Relations and Text in Scientific Network Clustering , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[99]  Hans-Peter Kriegel,et al.  Subspace and projected clustering: experimental evaluation and analysis , 2009, Knowledge and Information Systems.

[100]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[101]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[102]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[103]  Ulrik Brandes,et al.  Engineering graph clustering: Models and experimental evaluation , 2008, JEAL.

[104]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[105]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[106]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[107]  Lise Getoor,et al.  Co-evolution of social and affiliation networks , 2009, KDD.

[108]  Martin Atzmüller,et al.  Efficient Descriptive Community Mining , 2011, FLAIRS.

[109]  Elena Marchiori,et al.  An axiomatic study of objective functions for graph clustering , 2013, ArXiv.

[110]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[111]  Jianyong Wang,et al.  CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[112]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[113]  François Poulet,et al.  Integrating heterogeneous information within a social network for detecting communities , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[114]  Jure Leskovec,et al.  Multiplicative Attribute Graph Model of Real-World Networks , 2010, Internet Math..