ASCLU : Alternative Subspace Clustering

Finding groups of similar objects in databases is one of the most important data mining tasks. Recently, traditional clustering approaches have been extended to generate alternative clustering solutions. The basic observation is that for each database object multiple meaningful groupings might exist: the data allows to be clustered through different perspectives. It is thus reasonable to search for deviating clusters compared to a given clustering result, that the user is not satisfied with. The existing methods focus on full space clustering. However, for today’s applications, where many attributes per object are recorded, traditional clustering is known to generate no meaningful results. Instead, the analysis of subspace projections of the data with subspace or projected clustering techniques is more suitable. In this paper, we develop the first method that detects alternative subspace clusters based on an already known subspace clustering. Considering subspace projections, we can identify alternative clusters also based on deviating dimension sets besides just deviating object sets. Thus, we realize different views on the data by using different attributes. Besides the challenge of detecting alternative subspace clusters our model avoids redundant clusters in the overall result, i.e. the generated clusters are dissimilar among each other. In experiments we analyze the effectiveness of our model and show that meaningful alternative subspace clustering solutions are generated.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[3]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[4]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[5]  Thomas Hofmann,et al.  Non-redundant clustering with conditional ensembles , 2005, KDD '05.

[6]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Ying Cui,et al.  Non-redundant Multi-view Clustering via Orthogonalization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[8]  Jörg Sander,et al.  Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering , 2008, KDD.

[9]  Ian Davidson,et al.  Finding Alternative Clusterings Using Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Ira Assent,et al.  INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Emmanuel Müller,et al.  Detection of orthogonal concepts in subspaces of high dimensional data , 2009, CIKM.

[12]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[13]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[14]  Ian Davidson,et al.  A principled and flexible framework for finding alternative clusterings , 2009, KDD.

[15]  Ira Assent,et al.  Relevant Subspace Clustering: Mining the Most Interesting Non-redundant Concepts in High Dimensional Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .