Rough subspace-based clustering ensemble for categorical data

Clustering categorical data arising as an important problem of data mining has recently attracted much attention. In this paper, the problem of unsupervised dimensionality reduction for categorical data is first studied. Based on the theory of rough sets, the attributes of categorical data are decomposed into a number of rough subspaces. A novel clustering ensemble algorithm based on rough subspaces is then proposed to deal with categorical data. The algorithm employs some of rough subspaces with high quality to cluster the data and yields a robust and stable solution by exploiting the resulting partitions. We also introduce a cluster index to evaluate the solution of clustering algorithm for categorical data. Experimental results for selected UCI data sets show that the proposed method produces better results than those obtained by other methods when being evaluated in terms of cluster validity indexes.

[1]  Zengyou He,et al.  A cluster ensemble method for clustering categorical data , 2005, Information Fusion.

[2]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[3]  Witold Pedrycz,et al.  Fuzzy Clustering With Viewpoints , 2010, IEEE Transactions on Fuzzy Systems.

[4]  Taoying Li,et al.  Fuzzy Clustering Ensemble with Selection of Number of Clusters , 2010, J. Comput..

[5]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Licheng Jiao,et al.  Bagging-based spectral clustering ensemble selection , 2011, Pattern Recognit. Lett..

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  Guan Lihe A New Algorithm for Attribute Reduction Based on Discernibility Matrix , 2007, ICFIE.

[9]  Tao Li,et al.  On combining multiple clusterings: an overview and a new perspective , 2010, Applied Intelligence.

[10]  Andrzej Bargiela,et al.  A model of granular data: a design problem with the Tchebyschev FCM , 2005, Soft Comput..

[11]  Fang Liu,et al.  Spectral Clustering Ensemble Applied to SAR Image Segmentation , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[14]  Zhiwen Yu,et al.  Knowledge Based Cluster Ensemble for Cancer Discovery From Biomolecular Data , 2011, IEEE Transactions on NanoBioscience.

[15]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[16]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[17]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Witold Pedrycz,et al.  Conditional Fuzzy C-Means , 1996, Pattern Recognit. Lett..

[19]  Hui-lan Luo,et al.  Combining Multiple Clusterings using Information Theory based Genetic Algorithm , 2006, 2006 International Conference on Computational Intelligence and Security.

[20]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[21]  William F. Punch,et al.  A Comparison of Resampling Methods for Clustering Ensembles , 2004, IC-AI.

[22]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[23]  Kagan Tumer,et al.  Ensemble clustering with voting active clusters , 2008, Pattern Recognit. Lett..

[24]  K. Thangavel,et al.  Dimensionality reduction based on rough set theory: A review , 2009, Appl. Soft Comput..

[25]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Zhi-Hua Zhou,et al.  SOM Ensemble-Based Image Segmentation , 2004, Neural Processing Letters.

[27]  Zhiwen Yu,et al.  Image Segmentation Based on Cluster Ensemble , 2007, ISNN.

[28]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[29]  Yuchou Chang,et al.  Consensus unsupervised feature ranking from multiple views , 2008, Pattern Recognit. Lett..

[30]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[31]  Jane You,et al.  Hybrid cluster ensemble framework based on the random combination of data transformation operators , 2012, Pattern Recognit..

[32]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[33]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[34]  Witold Pedrycz,et al.  Analysis of alternative objective functions for attribute reduction in complete decision tables , 2011, Soft Comput..

[35]  Feng Chong Improved Algorithm of Attribute Reduction Based on Discernibility Matrix , 2007 .

[36]  Witold Pedrycz,et al.  Knowledge-based clustering - from data to information granules , 2007 .

[37]  Jane You,et al.  From cluster ensemble to structure ensemble , 2012, Inf. Sci..

[38]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[39]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[40]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[41]  Yiyu Yao,et al.  Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model , 2009, Inf. Sci..

[42]  Zhiwen Yu,et al.  Fuzzy cluster ensemble and its application on 3D head model classification , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[43]  Daniel Barbará,et al.  Random Subspace Ensembles for Clustering Categorical Data , 2008 .

[44]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[46]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[47]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[49]  Zhiwen Yu,et al.  Class Discovery From Gene Expression Data Based on Perturbation and Cluster Ensemble , 2009, IEEE Transactions on NanoBioscience.

[50]  Aleksander Ohrn,et al.  ROSETTA -- A Rough Set Toolkit for Analysis of Data , 1997 .

[51]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[52]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[53]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Joachim M. Buhmann,et al.  Combining partitions by probabilistic label aggregation , 2005, KDD '05.

[55]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[56]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[57]  Aristides Gionis,et al.  Clustering Aggregation (long version) , 2007 .

[58]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[59]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.