Discovering top-k non-redundant clusterings in attributed graphs

Many graph clustering algorithms focus on producing a single partition of the vertices in the input graph. Nevertheless, a single partition may not provide sufficient insight about the underlying data. In this context, it would be interesting to explore alternative clustering solutions. Many areas, such as social media marketing demand exploring multiple clustering solutions in social networks to allow for behavior analysis to find, for example, potential customers or influential members according to different perspectives. Additionally, it would be desirable to provide not only multiple clustering solutions, but also to present multiple non-redundant ones, in order to unleash the possible many facets from the underlying dataset. In this paper, we propose RM-CRAG, a novel algorithm to discover the top-k non-redundant clustering solutions in attributed graphs, i.e., a ranking of clusterings that share the least amount of information, in the information theoretic sense. We also propose MVNMI, an evaluation criterion to assess the quality of a set of clusterings. Experimental results using different datasets show the effectiveness of the proposed algorithm.

[1]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[2]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[3]  A. Zimek,et al.  On Using Class-Labels in Evaluation of Clusterings , 2010 .

[4]  A. Shimbel Structural parameters of communication networks , 1953 .

[5]  Jun Wang,et al.  Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval , 2009, ECIR.

[6]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[7]  Rajesh Parekh,et al.  Predicting product adoption in large-scale social networks , 2010, CIKM.

[8]  Richard Koch,et al.  The 80/20 Principle , 1997 .

[9]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[10]  Witold Pedrycz,et al.  Agreement-based fuzzy C-means for clustering data with blocks of features , 2014, Neurocomputing.

[11]  Zengyou He,et al.  k-ANMI: A mutual information based clustering algorithm for categorical data , 2005, Inf. Fusion.

[12]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[13]  Pushpa N. Rathie,et al.  On the entropy of continuous probability distributions (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[14]  Rich Caruana,et al.  Meta Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  William Rand,et al.  Evolving viral marketing strategies , 2010, GECCO '10.

[16]  Ying Cui,et al.  Learning multiple nonredundant clusterings , 2010, TKDD.

[17]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[18]  Silke Wagner,et al.  Comparing Clusterings - An Overview , 2007 .

[19]  Hazarath Munaga,et al.  DenTrac: A Density based Trajectory Clustering Tool , 2012 .

[20]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[21]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[22]  Emmanuel Müller,et al.  Discovering Multiple Clustering Solutions: Grouping Objects in Different Views of the Data , 2012, ICDE.

[23]  Francesco Napolitano,et al.  Global optimization, Meta Clustering and consensus clustering for class prediction , 2009, 2009 International Joint Conference on Neural Networks.

[24]  Eduardo Bezerra,et al.  Exploring multiple clusterings in attributed graphs , 2015, SAC.

[25]  I. N. A. C. I. J. H. Fowler Book Review: Connected: The surprising power of our social networks and how they shape our lives. , 2009 .

[26]  Ricardo Llano-González Fowler, J. & Christakis, N. (2009). Connected: the surprising power of our social networks and how they shape our lives. New York: Little, Brown and Company. , 2012 .

[27]  Geng Li,et al.  Stochastic subspace search for top-k multi-view clustering , 2013, MultiClust@KDD.

[28]  Richard Koch,et al.  The 80/20 Principle: The Secret of Achieving More With Less , 1998 .

[29]  Michael I. Jordan,et al.  Multiple Non-Redundant Spectral Clustering Views , 2010, ICML.

[30]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[32]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[33]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[34]  John Scott What is social network analysis , 2010 .

[35]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[36]  David M. W. Powers,et al.  Characterization and evaluation of similarity measures for pairs of clusterings , 2009, Knowledge and Information Systems.

[37]  Evangelos E. Milios,et al.  Text clustering using one-mode projection of document-word bipartite graphs , 2013, SAC '13.

[38]  C. Izard,et al.  Stability of emotion experiences and their relations to traits of personality. , 1993, Journal of personality and social psychology.

[39]  David F. Nettleton,et al.  Data mining of social networks represented as graphs , 2013, Comput. Sci. Rev..