Categorical Data Clustering: A Correlation-Based Approach for Unsupervised Attribute Weighting

The interest in attribute weighting, in clustering tasks, have been increasing in the last years. However, few attempts have been made to apply automated attribute weighting to categorical data clustering. Most of the existing approaches computes the weights based on the frequency of the mode category or according to the average distance of data objects from the mode of a cluster. In this paper, we adopt a different approach, investigating how to use the correlation among categorical attributes for measuring their relevancies in clustering tasks. As a result, we propose a correlation-based attribute weighting approach for categorical attributes.

[1]  Vladimir M. Sloutsky,et al.  From Perceptual Categories to Concepts: What Develops? , 2010, Cogn. Sci..

[2]  Jiye Liang,et al.  A weighting k-modes algorithm for subspace clustering of categorical data , 2013, Neurocomputing.

[3]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[4]  Jianhong Wu,et al.  Subspace clustering for high dimensional categorical data , 2004, SKDD.

[5]  Ira Assent,et al.  CLICKS: an effective algorithm for mining subspace clusters in categorical datasets , 2005, KDD '05.

[6]  Joel Luis Carbonera,et al.  An Entropy-Based Subspace Clustering Algorithm for Categorical Data , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[7]  Panayiotis Tsaparas,et al.  Categorical Data Clustering , 2010, Encyclopedia of Machine Learning.

[8]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[9]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[10]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[11]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[12]  B Younger Developmental change in infant categorization: the perception of correlations among facial features. , 1992, Child development.

[13]  Eugenio Cesario,et al.  Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  George C. Tseng,et al.  Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data , 2007, Bioinform..

[15]  Zengyou He,et al.  Attribute value weighting in k-modes clustering , 2011, Expert Syst. Appl..

[16]  V. Sloutsky,et al.  What's behind different kinds of kinds: effects of statistical density on learning and representation of categories. , 2008, Journal of experimental psychology. General.

[17]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Hans-Peter Kriegel,et al.  Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? , 2010, SSDBM.

[19]  F. Klawonn,et al.  Fuzzy clustering with weighting of data variables , 2000 .

[20]  Zhaohong Deng,et al.  Enhanced soft subspace clustering integrating within-cluster and between-cluster information , 2010, Pattern Recognit..

[21]  Jiye Liang,et al.  A novel attribute weighting algorithm for clustering high-dimensional categorical data , 2011, Pattern Recognit..