Cluster ensembles: A survey of approaches with recent extensions and applications

Abstract Cluster ensembles have been shown to be better than any standard clustering algorithm at improving accuracy and robustness across different data collections. This meta-learning formalism also helps users to overcome the dilemma of selecting an appropriate technique and the corresponding parameters, given a set of data to be investigated. Almost two decades after the first publication of a kind, the method has proven effective for many problem domains, especially microarray data analysis and its down-streaming applications. Recently, it has been greatly extended both in terms of theoretical modelling and deployment to problem solving. The survey attempts to match this emerging attention with the provision of fundamental basis and theoretical details of state-of-the-art methods found in the present literature. It yields the ranges of ensemble generation strategies, summarization and representation of ensemble members, as well as the topic of consensus clustering. This review also includes different applications and extensions of cluster ensemble, with several research issues and challenges being highlighted.

[1]  Sangeeta Ahuja,et al.  Regionalization of River Basins Using Cluster Ensemble , 2012 .

[2]  Da Ruan,et al.  Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems , 2012, Knowl. Based Syst..

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Yunli Wang,et al.  Semi-supervised consensus clustering for gene expression data analysis , 2014, BioData Mining.

[5]  Jane You,et al.  Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Junping Du,et al.  Cluster Ensemble-Based Image Segmentation , 2013 .

[7]  Hamid Parvin,et al.  Cluster ensemble selection based on a new cluster stability measure , 2014, Intell. Data Anal..

[8]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[9]  Dejan Juric,et al.  Functional network analysis reveals extended gliomagenesis pathway maps and three novel MYC-interacting genes in human gliomas. , 2005, Cancer research.

[10]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Bi-Ru Dai,et al.  A fragment-based iterative consensus clustering algorithm with a robust similarity , 2013, Knowledge and Information Systems.

[12]  Abdolreza Mirzaei,et al.  Optimized aggregation function in hierarchical clustering combination , 2016, Intell. Data Anal..

[13]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[14]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[15]  Hau-San Wong,et al.  Generalized Adjusted Rand Indices for cluster ensembles , 2012, Pattern Recognit..

[16]  Sang-Woon Kim A pre-clustering technique for optimizing subclass discriminant analysis , 2010, Pattern Recognit. Lett..

[17]  Pawan Lingras,et al.  Partially ordered rough ensemble clustering for multigranular representations , 2015, Intell. Data Anal..

[18]  Santo Fortunato,et al.  Consensus clustering in complex networks , 2012, Scientific Reports.

[19]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[20]  Michael I. Jordan,et al.  Cluster Forests , 2011, Comput. Stat. Data Anal..

[21]  Peng Yang,et al.  Microbial community pattern detection in human body habitats via ensemble clustering framework , 2014, BMC Systems Biology.

[22]  Xiaohui Liu,et al.  Consensus clustering and functional interpretation of gene-expression data , 2004, Genome Biology.

[23]  I. A. Pestunov,et al.  Ensemble of clustering algorithms for large datasets , 2011 .

[24]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[25]  Tossapon Boongoen,et al.  New cluster ensemble approach to integrative biological data analysis , 2013, Int. J. Data Min. Bioinform..

[26]  David R. Westhead,et al.  Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer , 2016, Nucleic acids research.

[27]  Bing Li,et al.  Efficient Clustering Aggregation Based on Data Fragments , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Gianluigi Zanetti,et al.  Channeling the data deluge , 2011, Nature Methods.

[29]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[30]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[31]  Jukka Corander,et al.  A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Naomie Salim,et al.  Weighted voting-based consensus clustering for chemical structure databases , 2014, Journal of Computer-Aided Molecular Design.

[33]  Hamid Parvin,et al.  Optimizing Fuzzy Cluster Ensemble in String Representation , 2013, Int. J. Pattern Recognit. Artif. Intell..

[34]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[36]  Javed Mostafa,et al.  Information Retrieval by Semantic Analysis and Visualization of the Concept Space of D-Lib Magazine , 2002, D Lib Mag..

[37]  Jane You,et al.  SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles , 2012, TCBB.

[38]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[39]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[40]  Dan A. Simovici,et al.  Finding Median Partitions Using Information-Theoretical-Based Genetic Algorithms , 2002, J. Univers. Comput. Sci..

[41]  Jianghong Wei,et al.  Fuzzy -Means and Cluster Ensemble with Random Projection for Big Data Clustering , 2016 .

[42]  Hosein Alizadeh,et al.  Hierarchical cluster ensemble selection , 2015, Eng. Appl. Artif. Intell..

[43]  Licheng Jiao,et al.  Bagging-based spectral clustering ensemble selection , 2011, Pattern Recognit. Lett..

[44]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[45]  Abdolreza Mirzaei,et al.  An information theoretic approach to hierarchical clustering combination , 2015, Neurocomputing.

[46]  David B. Dunson,et al.  Bayesian consensus clustering , 2013, Bioinform..

[47]  William F. Punch,et al.  Effects of resampling method and adaptation on clustering ensemble efficacy , 2011, Artificial Intelligence Review.

[48]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Naomie Salim,et al.  Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures , 2013, Molecular informatics.

[50]  Bernd Walter,et al.  Survey on test collections and techniques for personal name matching , 2006, Int. J. Metadata Semant. Ontologies.

[51]  Erkki Oja,et al.  Improving cluster analysis by co-initializations , 2014, Pattern Recognit. Lett..

[52]  Yunjun Gao,et al.  Hybrid clustering solution selection strategy , 2014, Pattern Recognit..

[53]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[54]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Hamido Fujita,et al.  Hierarchical cluster ensemble model based on knowledge granulation , 2016, Knowl. Based Syst..

[56]  William F. Punch,et al.  Data weighing mechanisms for clustering ensembles , 2013, Comput. Electr. Eng..

[57]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[58]  Tetsuo Furukawa,et al.  SOM of SOMs , 2009, Neural Networks.

[59]  Hamid Parvin,et al.  A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm , 2013, Pattern Analysis and Applications.

[60]  David G. Stork,et al.  Pattern Classification , 1973 .

[61]  Emmanuel Ramasso,et al.  Unsupervised Consensus Clustering of Acoustic Emission Time-Series for Robust Damage Sequence Estimation in Composites , 2015, IEEE Transactions on Instrumentation and Measurement.

[62]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Roberto Avogadri,et al.  Fuzzy ensemble clustering based on random projections for DNA microarray data analysis , 2009, Artif. Intell. Medicine.

[64]  Hui Xiong,et al.  K-Means-Based Consensus Clustering: A Unified View , 2015, IEEE Transactions on Knowledge and Data Engineering.

[65]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[66]  Ujjwal Maulik,et al.  SVMeFC: SVM Ensemble Fuzzy Clustering for Satellite Image Segmentation , 2012, IEEE Geoscience and Remote Sensing Letters.

[67]  Ioannis T. Christou,et al.  Coordination of Cluster Ensembles via Exact Methods , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Tao Li,et al.  A Framework for Hierarchical Ensemble Clustering , 2014, TKDD.

[69]  Dimitrios Gunopulos,et al.  Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.

[70]  Claudio Carpineto,et al.  Consensus Clustering Based on a New Probabilistic Rand Index with Application to Subtopic Retrieval , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[72]  Naomie Salim,et al.  Voting-based consensus clustering for combining multiple clusterings of chemical structures , 2012, Journal of Cheminformatics.

[73]  Fan Yang,et al.  Exploring the diversity in cluster ensemble generation: Random sampling and random projection , 2014, Expert Syst. Appl..

[74]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Li Zhang,et al.  Cascaded cluster ensembles , 2012, Int. J. Mach. Learn. Cybern..

[76]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[77]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[78]  Ertunc Erdil,et al.  Combining multiple clusterings using similarity graph , 2011, Pattern Recognit..

[79]  Thierry Denoeux,et al.  Ensemble clustering in the belief functions framework , 2011, Int. J. Approx. Reason..

[80]  Donald C. Wunsch,et al.  Clustering Data of Mixed Categorical and Numerical Type With Unsupervised Feature Learning , 2015, IEEE Access.

[81]  Wenchao Xiao,et al.  Semi-supervised hierarchical clustering ensemble and its application , 2016, Neurocomputing.

[82]  Yang Liu,et al.  MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning , 2015, Comput. Intell. Neurosci..

[83]  Naif Alajlan,et al.  A dynamic weights OWA fusion for ensemble clustering , 2015, Signal Image Video Process..

[84]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[85]  Anan Banharnsakun,et al.  A MapReduce-based artificial bee colony for large-scale data clustering , 2017, Pattern Recognit. Lett..

[86]  Andrea Tagarelli,et al.  Metacluster-based Projective Clustering Ensembles , 2013, Machine Learning.

[87]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  Xiaoyi Jiang,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014, Pattern Recognit..

[89]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Trans. Knowl. Data Eng..

[90]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[91]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[92]  George S. Day,et al.  Using Cluster Analysis to Improve Marketing Experiments , 1971 .

[93]  Pan Su,et al.  A hierarchical fuzzy cluster ensemble approach and its application to big data clustering , 2015, J. Intell. Fuzzy Syst..

[94]  Sharanjit Kaur,et al.  Discriminant analysis-based cluster ensemble , 2015, Int. J. Data Min. Model. Manag..

[95]  Tsaipei Wang,et al.  CA-Tree: A Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[96]  Hamid Parvin,et al.  To improve the quality of cluster ensembles by selecting a subset of base clusters , 2014, J. Exp. Theor. Artif. Intell..

[97]  Zhiwen Yu,et al.  Adaptive Noise Immune Cluster Ensemble Using Affinity Propagation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[98]  Joan Claudi Socoró,et al.  Positional and confidence voting-based consensus functions for fuzzy cluster ensembles , 2012, Fuzzy Sets Syst..

[99]  Daniel A. Ashlock,et al.  MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering , 2009, BMC Bioinformatics.

[100]  Chang-Dong Wang,et al.  Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis , 2014, Neurocomputing.

[101]  Alessandro Fiori,et al.  DeCoClu: Density consensus clustering approach for public transport data , 2016, Inf. Sci..

[102]  Saturnino Maldonado-Bascón,et al.  Heterogeneous Visual Codebook Integration Via Consensus Clustering for Visual Categorization , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[103]  Tetsuo Furukawa,et al.  Modular network SOM , 2009, Neural Networks.

[104]  Xiaoyi Jiang,et al.  Cluster ensemble framework based on the group method of data handling , 2016, Appl. Soft Comput..

[105]  Yong Chen,et al.  Ensemble Clustering for Internet Security Applications , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[106]  Kyoung-jae Kim,et al.  A recommender system using GA K-means clustering in an online shopping market , 2008, Expert Syst. Appl..

[107]  Muhammad Yousefnezhad,et al.  Wisdom of Crowds cluster ensemble , 2016, Intell. Data Anal..

[108]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[109]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[110]  Isa Yildirim,et al.  An Approximate Spectral Clustering Ensemble for High Spatial Resolution Remote-Sensing Images , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[111]  Petros Xanthopoulos,et al.  A robust unsupervised consensus control chart pattern recognition framework , 2015, Expert Syst. Appl..

[112]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[113]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[114]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Cluster ensemble selection based on relative validity indexes , 2012, Data Mining and Knowledge Discovery.

[115]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[116]  Haitao Liu,et al.  An entropy-based clustering ensemble method to support resource allocation in business process management , 2015, Knowledge and Information Systems.

[117]  Arthur Flexer,et al.  On the use of self-organizing maps for clustering and visualization , 1999, Intell. Data Anal..

[118]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[119]  Ana L. N. Fred,et al.  Probabilistic consensus clustering using evidence accumulation , 2013, Machine Learning.

[120]  Yun Yang,et al.  HMM-based hybrid meta-clustering ensemble for temporal data , 2014, Knowl. Based Syst..

[121]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[122]  Hamid Parvin,et al.  A clustering ensemble framework based on elite selection of weighted clusters , 2013, Adv. Data Anal. Classif..

[123]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[124]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[125]  Vladimir B. Berikov Weighted ensemble of algorithms for complex data clustering , 2014, Pattern Recognit. Lett..

[126]  Seyed Mehdi Vahidipour,et al.  Comparing weighted combination of hierarchical clustering based on Cophenetic measure , 2014, Intell. Data Anal..

[127]  Le Ou-Yang,et al.  Protein Complex Detection via Weighted Ensemble Clustering Based on Bayesian Nonnegative Matrix Factorization , 2013, PloS one.

[128]  Boris G. Mirkin,et al.  Reinterpreting the Category Utility Function , 2001, Machine Learning.

[129]  Eun-Youn Kim,et al.  Multiscale ensemble clustering for finding modules in complex networks. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[130]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[131]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[132]  Tossapon Boongoen,et al.  LCE: a link-based cluster ensemble method for improved gene expression data analysis , 2010, Bioinform..

[133]  Zhiwen Yu,et al.  Knowledge Based Cluster Ensemble for Cancer Discovery From Biomolecular Data , 2011, IEEE Transactions on NanoBioscience.

[134]  D. Henry,et al.  Cluster analysis in family psychology research. , 2005, Journal of family psychology : JFP : journal of the Division of Family Psychology of the American Psychological Association.

[135]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008 .

[136]  Qiang Yang,et al.  Discriminatively regularized least-squares classification , 2009, Pattern Recognit..

[137]  Constantine Kotropoulos,et al.  Speaker Diarization Exploiting the Eigengap Criterion and Cluster Ensembles , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[138]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[139]  Yun Yang,et al.  Hybrid Sampling-Based Clustering Ensemble With Global and Local Constitutions , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[140]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[141]  Sandro Vega-Pons,et al.  On pruning the search space for clustering ensemble problems , 2015, Neurocomputing.

[142]  Nicolas H. Younan,et al.  On the Use of a Cluster Ensemble Cloud Classification Technique in Satellite Precipitation Estimation , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[143]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[144]  Shengli Wu,et al.  Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes , 2014, Sensors.

[145]  Jitender S. Deogun,et al.  Conceptual clustering in information retrieval , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[146]  Jane You,et al.  Hybrid cluster ensemble framework based on the random combination of data transformation operators , 2012, Pattern Recognit..

[147]  Raghu Machiraju,et al.  Breast cancer patient stratification using a molecular regularized consensus clustering method. , 2014, Methods.

[148]  Kurt Hornik,et al.  A Combination Scheme for Fuzzy Clustering , 2002, AFSS.

[149]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[150]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[151]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[152]  Yunjun Gao,et al.  Probabilistic cluster structure ensemble , 2014, Inf. Sci..

[153]  Michael Krauthammer,et al.  Complementary ensemble clustering of biomedical data , 2013, J. Biomed. Informatics.

[154]  Claudia Canali,et al.  Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems , 2013, Automated Software Engineering.

[155]  Baltasar Trancón y Widemann,et al.  Characterising flow patterns in soils by feature extraction and multiple consensus clustering , 2013, Ecol. Informatics.

[156]  J. Bezdek,et al.  Recent convergence results for the fuzzy c-means clustering algorithms , 1988 .

[157]  Chang-Dong Wang,et al.  Ensemble clustering using factor graph , 2016, Pattern Recognit..

[158]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .