论文信息 - High dimensional data Clustering Algorithm Based on Sparse Feature Vector for Categorical Attributes

High dimensional data Clustering Algorithm Based on Sparse Feature Vector for Categorical Attributes

An algorithm is proposed to cluster high dimensional data named as Clustering Algorithm Based On Sparse Feature Vector for Categorical Attributes (CABOSFV_C). It compresses data effectively by using ‘Sparse Feature Vector of a Set for Categorical Data’ without losing the information necessary for making clustering decisions, and can get the clustering result with once data scan by defining ‘Sparse Feature Dissimilarity of a Set for Categorical Data’ as distance measure. Because of the data reduction and once data scan strategy the algorithm has almost linear computation complexity and handles noise effectively. In addition, CABOSFV_C is suitable not only for sparse data but also for complete data, which is illustrated by two numeric examples at the end of the paper as well as other salient features of the algorithm.

Sen Wu | Guiying Wei

[1] Mohammed J. Zaki,et al. CLICK : Clustering Categorical Data using K-partite Maximal Cliques , 2004 .

[2] Varun Chandola,et al. Similarity measures for categorical data , 2008, SDM 2008.

[3] Jon M. Kleinberg,et al. Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[4] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[5] Mohammed J. Zaki,et al. CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6] Sudipto Guha,et al. ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7] Zengyou He,et al. A cluster ensemble method for clustering categorical data , 2005, Information Fusion.

[8] Johannes Gehrke,et al. CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[9] SenWu,et al. CABOSFV algorithm for high dimensional sparse data clustering , 2004 .

[10] Ji Hyea Han,et al. Data Mining : Concepts and Techniques 2 nd Edition Solution Manual , 2005 .

[11] R. Suganya,et al. Data Mining Concepts and Techniques , 2010 .