论文信息 - Cluster cores-based clustering for high dimensional data

Cluster cores-based clustering for high dimensional data

We propose a new approach to clustering high dimensional data based on a novel notion of cluster cores, instead of on nearest neighbors. A cluster core is a fairly dense group with a maximal number of pairwise similar objects. It represents the core of a cluster, as all objects in a cluster are with a great degree attracted to it. As a result, building clusters from cluster cores achieves high accuracy. Other major characteristics of the approach include: (1) It uses a semantics-based similarity measure. (2) It does not incur the curse of dimensionality and is scalable linearly with the dimensionality of data. (3) It outperforms the well-known clustering algorithm, ROCK, with both lower time complexity and higher accuracy.

[1] Ian Witten,et al. Data Mining , 2000 .

[2] Vipin Kumar,et al. Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[3] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[4] Vipin Kumar,et al. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[5] Daniel A. Keim,et al. Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[6] Philip S. Yu,et al. Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[7] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[8] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[9] Mauricio G. C. Resende,et al. Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[10] T. M. Murali,et al. A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[11] Ron Shamir,et al. A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[12] Sudipto Guha,et al. ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[13] Dimitrios Gunopulos,et al. Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[14] R. Jarvis,et al. ClusteringUsing a Similarity Measure Based on SharedNear Neighbors , 1973 .

[15] Panos M. Pardalos,et al. On maximum clique problems in very large graphs , 1999, External Memory Algorithms.

[16] Michael R. Anderberg,et al. Cluster Analysis for Applications , 1973 .

[17] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[18] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[20] Jiawei Han,et al. Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[21] Jiong Yang,et al. STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[22] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .

[23] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.