Cluster Evaluation of Density Based Subspace Clustering

Clustering real world data often faced with curse of dimensionality, where real world data often consist of many dimensions. Multidimensional data clustering evaluation can be done through a density-based approach. Density approaches based on the paradigm introduced by DBSCAN clustering. In this approach, density of each object neighbours with MinPoints will be calculated. Cluster change will occur in accordance with changes in density of each object neighbours. The neighbours of each object typically determined using a distance function, for example the Euclidean distance. In this paper SUBCLU, FIRES and INSCY methods will be applied to clustering 6x1595 dimension synthetic datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used as evaluation performance parameters. Evaluation results showed SUBCLU method requires considerable time to process subspace clustering; however, its value coverage is better. Meanwhile INSCY method is better for accuracy comparing with two other methods, although consequence time calculation was longer.

[1]  GunopulosDimitrios,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998 .

[2]  Ji Hyea Han,et al.  Data Mining : Concepts and Techniques 2 nd Edition Solution Manual , 2005 .

[3]  Christos Faloutsos,et al.  Finding Clusters in subspaces of very large, multi-dimensional datasets , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[4]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[5]  Jing Chen,et al.  Density Clustering Based SVM and Its Application to Polyadenylation Signals , 2009 .

[6]  Myoung-Ho Kim,et al.  FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting , 2004, Inf. Softw. Technol..

[7]  Nizar Bouguila,et al.  Model-based subspace clustering of non-Gaussian data , 2010, Neurocomputing.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Ira Assent,et al.  INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Dimitrios Gunopulos,et al.  Subspace Clustering of High Dimensional Data , 2004, SDM.

[11]  Shiwei Tang,et al.  Mining Representative Subspace Clusters in High-dimensional Data , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[12]  Thomas Seidl,et al.  Subspace Clustering for Uncertain Data , 2010, SDM.

[13]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[14]  Ira Assent,et al.  Outlier detection and ranking based on subspace clustering , 2008, Uncertainty Management in Information Systems.

[15]  Jasni Mohamad Zain,et al.  Clustering high dimensional data using subspace and projected clustering algorithms , 2010, ArXiv.

[16]  Hans-Peter Kriegel,et al.  A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[17]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[18]  Shuangyuan Yang,et al.  A Subtractive Based Subspace Clustering Algorithm on High Dimensional Data , 2009, 2009 First International Conference on Information Science and Engineering.

[19]  Zijiang Yang,et al.  PARTCAT: A Subspace Clustering Algorithm for High Dimensional Categorical Data , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[20]  Jiadong Ren,et al.  A Weighted Subspace Clustering Algorithm in High-Dimensional Data Streams , 2009, 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC).

[21]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[22]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.