Diversity control for improving the analysis of consensus clustering

Consensus clustering has emerged as a powerful technique for obtaining better clustering results, where a set of data partitions (ensemble) are generated, which are then combined to obtain a consolidated solution (consensus partition) that outperforms all of the members of the input set. The diversity of ensemble partitions has been found to be a key aspect for obtaining good results, but the conclusions of previous studies are contradictory. Therefore, ensemble diversity analysis is currently an important issue because there are no methods for smoothly changing the diversity of an ensemble, which makes it very difficult to study the impact of ensemble diversity on consensus results. Indeed, ensembles with similar diversity can have very different properties, thereby producing a consensus function with unpredictable behavior. In this study, we propose a novel method for increasing and decreasing the diversity of data partitions in a smooth manner by adjusting a single parameter, thereby achieving fine-grained control of ensemble diversity. The results obtained using well-known data sets indicate that the proposed method is effective for controlling the dissimilarity among ensemble members to obtain a consensus function with smooth behavior. This method is important for facilitating the analysis of the impact of ensemble diversity in consensus clustering.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Sergio Greco,et al.  Diversity-Based Weighting Schemes for Clustering Ensembles , 2009, SDM.

[3]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chang-Dong Wang,et al.  Weighted Multi-view Clustering with Feature Selection , 2016, Pattern Recognit..

[5]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[6]  Chang-Dong Wang,et al.  Ensemble clustering using factor graph , 2016, Pattern Recognit..

[7]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[8]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, ICPR 2004.

[9]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008, Statistical analysis and data mining.

[10]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[11]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[13]  Yunjun Gao,et al.  Hybrid clustering solution selection strategy , 2014, Pattern Recognit..

[14]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[15]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[16]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Cluster ensemble selection based on relative validity indexes , 2012, Data Mining and Knowledge Discovery.

[17]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[18]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Georgina Stegmayer,et al.  A Biologically Inspired Validity Measure for Comparison of Clustering Methods over Metabolic Data Sets , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Jane You,et al.  Hybrid cluster ensemble framework based on the random combination of data transformation operators , 2012, Pattern Recognit..

[21]  Hui Xiong,et al.  K-Means-Based Consensus Clustering: A Unified View , 2015, IEEE Transactions on Knowledge and Data Engineering.

[22]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[23]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[24]  Hamido Fujita,et al.  Hierarchical cluster ensemble model based on knowledge granulation , 2016, Knowl. Based Syst..

[25]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[26]  Jane You,et al.  From cluster ensemble to structure ensemble , 2012, Inf. Sci..

[27]  Alessandro Fiori,et al.  DeCoClu: Density consensus clustering approach for public transport data , 2016, Inf. Sci..

[28]  Yunjun Gao,et al.  Probabilistic cluster structure ensemble , 2014, Inf. Sci..

[29]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[30]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[31]  Fan Yang,et al.  Exploring the diversity in cluster ensemble generation: Random sampling and random projection , 2014, Expert Syst. Appl..

[32]  Jonathan Lee,et al.  GEA: A Goal-Driven Approach toDiscovering Early Aspects , 2014, IEEE Transactions on Software Engineering.

[33]  Wei Yuan,et al.  Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization , 2011, Inf. Sci..

[34]  William F. Punch,et al.  A Comparison of Resampling Methods for Clustering Ensembles , 2004, IC-AI.

[35]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, CVPR 2004.

[36]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[37]  Yi Peng,et al.  Evaluation of clustering algorithms for financial risk analysis using MCDM methods , 2014, Inf. Sci..

[38]  Joshua Zhexue Huang,et al.  Stratified feature sampling method for ensemble clustering of high dimensional data , 2015, Pattern Recognit..

[39]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Milton Pividori,et al.  A very simple and fast way to access and validate algorithms in reproducible research , 2016, Briefings Bioinform..

[41]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[42]  Yun Yang,et al.  Hybrid Sampling-Based Clustering Ensemble With Global and Local Constitutions , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Fei Wang,et al.  Boosting GMM and Its Two Applications , 2005, Multiple Classifier Systems.

[44]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[45]  Ertunc Erdil,et al.  Combining multiple clusterings using similarity graph , 2011, Pattern Recognit..

[46]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[47]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[48]  Yun Yang,et al.  Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations , 2011, IEEE Transactions on Knowledge and Data Engineering.

[49]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[50]  Jinfeng Yi,et al.  Robust Ensemble Clustering by Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[51]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[52]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Xiaoping Fan,et al.  A New Selective Clustering Ensemble Algorithm , 2012, ICEBE.

[54]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Fei Wang,et al.  Generalized Cluster Aggregation , 2009, IJCAI.

[56]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..