Knowledge based Cluster Ensemble

Although there exist a lot of cluster ensemble approaches, few of them consider the prior knowledge of the datasets. In this paper, we propose a new cluster ensemble approach called knowledge based cluster ensemble (KCE) which incorporates the prior knowledge of the dataset into the cluster ensemble framework. Specifically, the prior knowledge of the dataset is first represented by the side information which is encoded as pairwise constraints. Then, KCE generates a set of cluster solutions by the basic clustering algorithm. Next, KCE transforms the pairwise constraints to the confidence factor of the cluster solutions. In the following, the new data matrix is constructed by considering all the cluster solutions and their corresponding confidence factor. Finally, the results are obtained by partitioning the consensus matrix. The experiments illustrate that (1) KCE works well on the real datasets; (2) KCE outperforms most of the state-of-art cluster ensemble approaches.

[1]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[2]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Tsuhan Chen,et al.  Efficient feature extraction for 2D/3D objects in mesh representation , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[5]  Hans-Peter Kriegel,et al.  Using sets of feature vectors for similarity search on voxelized CAD objects , 2003, SIGMOD '03.

[6]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, ICPR 2004.

[7]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Jill P. Mesirov,et al.  A resampling-based method for class discovery and visualization of gene expression microarray data , 2003 .

[9]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[10]  L. Kuncheva ‘ Fuzzy ’ vs ‘ Non-fuzzy ’ in Combining Classifiers Designed by Boosting , 2003 .

[11]  Joachim M. Buhmann,et al.  Combining partitions by probabilistic label aggregation , 2005, KDD '05.

[12]  Carl E. Rasmussen,et al.  Max – Planck – Institut f ür biologische Kybernetik Max Planck Institute for Biological Cybernetics Technical Report No . 136 Approximate Inference for Robust Gaussian Process Regression , 2005 .

[13]  Kenneth G. Manton,et al.  Fuzzy Cluster Analysis , 2005 .

[14]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[15]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[16]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[17]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[18]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ludmila I. Kuncheva,et al.  "Fuzzy" versus "nonfuzzy" in combining classifiers designed by Boosting , 2003, IEEE Trans. Fuzzy Syst..

[21]  Berthold K. P. Horn Extended Gaussian images , 1984, Proceedings of the IEEE.