An Incremental Clustering with Attribute Unbalance Considered for Categorical Data

Clustering analysis is an important technique used in many fields. But traditional clustering algorithms generally deal with numeric data. While clustering categorical data have always attracted researchers’ attentions because of their prevalence in real life. This paper analyses limitations of the categorical clustering algorithms proposed. Based on two observations, a new similarity measure is proposed for categorical data which considers the unbalance of attributes. As the data are getting much larger and more dynamic, incremental is an important quality of good clustering algorithms. The clustering algorithm present is an incremental with linear computing complexity. The experiment results indicate that it outperforms other categorical clustering algorithms referred in the paper.