Hierarchical cluster ensemble selection

Abstract Clustering ensemble performance is affected by two main factors: diversity and quality. Selection of a subset of available ensemble members based on diversity and quality often leads to a more accurate ensemble solution. However, there is not a certain relationship between diversity and quality in selection of subset of ensemble members. This paper proposes the Hierarchical Cluster Ensemble Selection (HCES) method and diversity measure to explore how diversity and quality affect final results. The HCES uses single-link, average-link, and complete link agglomerative clustering methods for the selection of ensemble members hierarchically. A pair-wise diversity measure from the recent literature and the proposed diversity measure are applied to these agglomerative clustering algorithms. Using the proposed diversity measure in HCES leads to more diverse ensemble members than that of pairwise diversity measure. Cluster-based Similarity Partition Algorithm (CSPA) and Hypergraph-Partitioning Algorithm (HGPA) were employed in HCES method for obtaining the full ensemble and cluster ensemble selection solution. To evaluate the performance of the HCES method, several experiments were conducted on several real data sets and the obtained results were compared to those of full ensembles. The results showed that the HCES method led to a more significant performance improvement compared with full ensembles.

[1]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[2]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[3]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Cluster ensemble selection based on relative validity indexes , 2012, Data Mining and Knowledge Discovery.

[4]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[5]  L. Hubert,et al.  Comparing partitions , 1985 .

[6]  José G. Dias,et al.  Mining categorical sequences from data using a hybrid clustering method , 2014, Eur. J. Oper. Res..

[7]  Ertunc Erdil,et al.  Obtaining better quality final clustering by merging a collection of clusterings , 2010, Bioinform..

[8]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[9]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[10]  Roberto Avogadri,et al.  Fuzzy ensemble clustering based on random projections for DNA microarray data analysis , 2009, Artif. Intell. Medicine.

[11]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[12]  Licheng Jiao,et al.  Bagging-based spectral clustering ensemble selection , 2011, Pattern Recognit. Lett..

[13]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[14]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Vladimir B. Berikov Weighted ensemble of algorithms for complex data clustering , 2014, Pattern Recognit. Lett..

[16]  Yi Hong,et al.  Resampling-based selective clustering ensembles , 2009, Pattern Recognit. Lett..

[17]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems , 2015, Lecture Notes in Computer Science.

[18]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[19]  Chongzhao Han,et al.  Rough set based cluster ensemble selection , 2013, Proceedings of the 16th International Conference on Information Fusion.

[20]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[21]  Yan Yang,et al.  Selective Clustering Ensemble Based on Covariance , 2013, MCS.

[22]  William F. Punch,et al.  Ensembles of partitions via data resampling , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[23]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[24]  Anil K. Jain,et al.  Data Clustering: A User's Dilemma , 2005, PReMI.

[25]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[26]  William F. Punch,et al.  Effects of resampling method and adaptation on clustering ensemble efficacy , 2011, Artificial Intelligence Review.

[27]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[28]  R. Somogyi,et al.  Gene Expression Data Analysis and Modeling , 1999 .

[29]  Ertunc Erdil,et al.  An efficient and scalable family of algorithms for combining clusterings , 2013, Eng. Appl. Artif. Intell..

[30]  Eytan Domany,et al.  Cluster analysis of human autoantibody reactivities in health and in type 1 diabetes mellitus: a bio-informatic approach to immune complexity. , 2003, Journal of autoimmunity.

[31]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[32]  Roberto Zavala,et al.  Multiple classifier systems in Akatek (Mayan) , 2000 .

[33]  Jane You,et al.  Hybrid cluster ensemble framework based on the random combination of data transformation operators , 2012, Pattern Recognit..

[34]  Fan Yang,et al.  Exploring the diversity in cluster ensemble generation: Random sampling and random projection , 2014, Expert Syst. Appl..

[35]  Licheng Jiao,et al.  Spectral clustering ensemble for image segmentation , 2009, GEC '09.

[36]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[37]  Hamid Parvin,et al.  To improve the quality of cluster ensembles by selecting a subset of base clusters , 2014, J. Exp. Theor. Artif. Intell..

[38]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[39]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[40]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008, Statistical analysis and data mining.

[41]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[42]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[43]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[44]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[46]  Wei Chen,et al.  Gene expression data analysis with the clustering method based on an improved quantum-behaved Particle Swarm Optimization , 2012, Eng. Appl. Artif. Intell..

[47]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..