PARTCAT: A Subspace Clustering Algorithm for High Dimensional Categorical Data

A new subspace clustering algorithm, PARTCAT, is proposed to cluster high dimensional categorical data. The architecture of PARTCAT is based on the recently developed neural network architecture PART, and a major modification is provided in order to deal with categorical attributes. PARTCAT requires less number of parameters than PART, and in particular, PARTCAT does not need the distance parameter that is needed in PART and is intimately related to the similarity in each fixed dimension. Some simulations using real data sets to show the performance of PARTCAT are provided.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Stephen Grossberg,et al.  ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures , 1990, Neural Networks.

[3]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[4]  Yongqiang Cao,et al.  Neural networks for clustering: theory, architecture, algorithm and neural dynamics , 2003 .

[5]  Stephen Grossberg,et al.  Art 2: Self-Organization Of Stable Category Recognition Codes For Analog Input Patterns , 1988, Other Conferences.

[6]  Myoung-Ho Kim,et al.  FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting , 2004, Inf. Softw. Technol..

[7]  Jianhong Wu,et al.  Projective ART for clustering data sets in high dimensional spaces , 2002, Neural Networks.

[8]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[9]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[10]  Jianhong Wu,et al.  Dynamics of projective adaptive resonance theory model: the foundation of PART algorithm , 2004, IEEE Transactions on Neural Networks.

[11]  Jianhong Wu,et al.  Subspace clustering for high dimensional categorical data , 2004, SKDD.

[12]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[13]  Alok N. Choudhary,et al.  A scalable parallel subspace clustering algorithm for massive data sets , 2000, Proceedings 2000 International Conference on Parallel Processing.

[14]  A. D. Gordon A Review of Hierarchical Classification , 1987 .

[15]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[16]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[17]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[18]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[19]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .