论文信息 - A k-populations algorithm for clustering categorical data

A k-populations algorithm for clustering categorical data

In this paper, the conventional k-modes-type algorithms for clustering categorical data are extended by representing the clusters of categorical data with k-populations instead of the hard-type centroids used in the conventional algorithms. Use of a population-based centroid representation makes it possible to preserve the uncertainty inherent in data sets as long as possible before actual decisions are made. The k-populations algorithm was found to give markedly better clustering results through various experiments.

Doheon Lee | Dae-Won Kim | Kwang Hyung Lee | Ki Young Lee

[1] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[2] Michael K. Ng,et al. A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[3] Joshua Zhexue Huang,et al. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[4] Edwin Diday,et al. Symbolic clustering using a new dissimilarity measure , 1991, Pattern Recognit..