Subspace Clustering Meets Dense Subgraph Mining: A Synthesis of Two Paradigms

Today's applications deal with multiple types of information: graph data to represent the relations between objects and attribute data to characterize single objects. Analyzing both data sources simultaneously can increase the quality of mining methods. Recently, combined clustering approaches were introduced, which detect densely connected node sets within one large graph that also show high similarity according to all of their attribute values. However, for attribute data it is known that this full-space clustering often leads to poor clustering results. Thus, subspace clustering was introduced to identify locally relevant subsets of attributes for each cluster. In this work, we propose a method for finding homogeneous groups by joining the paradigms of subspace clustering and dense sub graph mining, i.e. we determine sets of nodes that show high similarity in subsets of their dimensions and that are as well densely connected within the given graph. Our twofold clusters are optimized according to their density, size, and number of relevant dimensions. Our developed redundancy model confines the clustering to a manageable size of only the most interesting clusters. We introduce the algorithm Gamer for the efficient calculation of our clustering. In thorough experiments on synthetic and real world data we show that Gamer achieves low runtimes and high clustering qualities.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[3]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[4]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[5]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[6]  Andrew W. Moore,et al.  Tractable group detection on large link data sets , 2003, Third IEEE International Conference on Data Mining.

[7]  Martin Ester,et al.  Mining Cohesive Patterns from Graphs with Feature Vectors , 2009, SDM.

[8]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[9]  Guimei Liu,et al.  Effective Pruning Techniques for Mining Quasi-Cliques , 2008, ECML/PKDD.

[10]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[11]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[12]  Jianyong Wang,et al.  Coherent closed quasi-clique discovery from large dense graph databases , 2006, KDD '06.

[13]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[14]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[15]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.