Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework

Abstract Clustering of numerical data is a very well researched problem and so is clustering of categorical data. However, when it comes to clustering of data with mixed attributes, the literature is not that rich. For numerical data, fuzzy clustering, in particular, the fuzzy c-means (FCM), is a very effective and popular algorithm, while for categorical data, use of mixture model is quite popular. In this paper, we propose a novel framework for clustering of mixed data which contains both numerical and categorical attributes. Our objective is to find the cluster substructures that are common to both the categorical and numerical data. Our formulation is inspired by the FCM algorithm (for dealing with numerical data), mixture models (for dealing with categorical data), and the collaborative clustering framework for aggregation of the two—it is an integrated approach that judiciously uses all three components. We use our algorithm on a few commonly used datasets and compare our results with those by some state of the art methods.

[1]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[2]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[3]  B. S. Everitt,et al.  A finite mixture model for the clustering of mixed-mode data , 1988 .

[4]  Miin-Shen Yang,et al.  Fuzzy clustering algorithms for mixed feature variables , 2004, Fuzzy Sets Syst..

[5]  徐晓飞,et al.  Squeezer:An Efficient Algorithm for Clustering Categorical Data , 2002 .

[6]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[7]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[8]  Sotirios Chatzis,et al.  A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional , 2011, Expert Syst. Appl..

[9]  Hong Jia,et al.  Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number , 2013, Pattern Recognit..

[10]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[11]  Maoguo Gong,et al.  Unsupervised evolutionary clustering algorithm for mixed type data , 2010, IEEE Congress on Evolutionary Computation.

[12]  Witold Pedrycz,et al.  Collaborative clustering with the use of Fuzzy C-Means and its quantification , 2008, Fuzzy Sets Syst..

[13]  Xiao Han,et al.  A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data , 2012, Knowl. Based Syst..

[14]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[15]  Ohn Mar San,et al.  An alternative extension of the k-means algorithm for clustering categorical data , 2004 .

[16]  Hidetomo Ichihashi,et al.  Regularized linear fuzzy clustering and probabilistic PCA mixture models , 2005, IEEE Transactions on Fuzzy Systems.

[17]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[18]  He Zengyou,et al.  Squeezer: an efficient algorithm for clustering categorical data , 2002 .

[19]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[20]  Zengyou He,et al.  Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach , 2005, ArXiv.

[21]  Witold Pedrycz,et al.  Collaborative fuzzy clustering , 2002, Pattern Recognit. Lett..

[22]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[23]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[24]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[25]  Witold Pedrycz,et al.  Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines , 2012, IEEE Transactions on Fuzzy Systems.

[26]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.