A Method to Improve the Analysis of Cluster Ensembles

Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. However, there is disagreement about which are the best ensemble characteristics to obtain a good performance: some authors have suggested that highly dierent partitions within the ensemble are benecial for the nal performance, whereas others have stated that medium diversity among them is better. While there are several measures to quantify the diversity, a better method to analyze the best ensemble characteristics is necessary. This paper introduces a new ensemble generation strategy and a method to make slight changes in its structure. Experimental results on six datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble characteristics on the overall consensus performance.

[1]  Joydeep Ghosh,et al.  A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing , 2002 .

[2]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xiaoping Fan,et al.  A New Selective Clustering Ensemble Algorithm , 2012, ICEBE.

[4]  Brian Everitt,et al.  Cluster analysis , 1974 .

[5]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[6]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[7]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[8]  L. Hubert,et al.  Comparing partitions , 1985 .

[9]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[11]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[14]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[16]  Jinfeng Yi,et al.  Robust Ensemble Clustering by Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[17]  Michael I. Jordan,et al.  Cluster Forests , 2011, Comput. Stat. Data Anal..

[18]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[19]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[20]  Abdolreza Mirzaei,et al.  A hierarchical clusterer ensemble method based on boosting theory , 2013, Knowl. Based Syst..

[21]  Seiji Yamada,et al.  Clustering by Learning Constraints Priorities , 2012, 2012 IEEE 12th International Conference on Data Mining.

[22]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .