Clustering ensemble selection considering quality and diversity

It is highly likely that there is a partition that is judged by a stability measure as a bad one while it contains one (or more) high quality cluster(s); and then it is totally neglected. So, inspiring from the evaluation of partitions, researchers turn to define measures for evaluation of clusters. Many stability measures have been proposed such as Normalized Mutual Information to validate a partition. The defined measures are based on Normalized Mutual Information. The drawback of the commonly used approach will be discussed in this paper and a criterion is proposed to assess the association between a cluster and a partition which is called Edited Normalized Mutual Information, ENMI criterion. The ENMI criterion compensates the drawback of the common Normalized Mutual Information (NMI) measure. Also, a clustering ensemble method that is based on aggregating a subset of primary clusters is proposed. The proposed method uses the Average ENMI as fitness measure to select a number of clusters. The clusters that satisfy a predefined threshold of the mentioned measure are selected to participate in the final ensemble. To combine the chosen clusters a set of consensus function methods are employed. One class of the used consensus functions is the co-association based consensus functions. Since the Evidence Accumulation Clustering, EAC, method can’t derive the co-association matrix from a subset of clusters, Extended EAC, EEAC, is employed to construct the co-association matrix from the chosen subset of clusters. The second class of the used consensus functions is based on hyper graph partitioning algorithms. The other class of the used consensus functions considers the chosen clusters as a new feature space and uses a simple clustering algorithm to extract the consensus partitioning. The empirical studies show that the proposed method outperforms other well-known ensembles.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[3]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Cluster ensemble selection based on relative validity indexes , 2012, Data Mining and Knowledge Discovery.

[4]  Hamid Parvin,et al.  A New Method for Constructing Classifier Ensembles , 2009, J. Digit. Content Technol. its Appl..

[5]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[6]  Hamid Parvin,et al.  A comprehensive study of clustering ensemble weighting based on cluster quality and diversity , 2017, Pattern Analysis and Applications.

[7]  Hamid Parvin,et al.  Clustering Ensemble Selection Considering Quality and Diversity , 2015 .

[8]  Sandrine Dudoit,et al.  Applications of Resampling Methods to Estimate the Number of Clusters and to Improve the Accuracy of , 2001 .

[9]  Joachim M. Buhmann,et al.  A Resampling Approach to Cluster Validation , 2002, COMPSTAT.

[10]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[11]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[12]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.

[13]  Ehl Emile Aarts,et al.  Simulated annealing and Boltzmann machines , 2003 .

[14]  Yide Wang,et al.  Progressive Semisupervised Learning of Multiple Classifiers , 2018, IEEE Transactions on Cybernetics.

[15]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Chang-Dong Wang,et al.  Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis , 2014, Neurocomputing.

[17]  Shai Ben-David,et al.  Stability of k -Means Clustering , 2007, COLT.

[18]  Joachim M. Buhmann,et al.  Stability-Based Model Order Selection in Clustering with Applications to Gene Expression Data , 2002, ICANN.

[19]  Reza Derakhshani,et al.  An Ensemble Method for Classifying Startle Eyeblink Modulation from High-Speed Video Records , 2011, IEEE Transactions on Affective Computing.

[20]  Andrea Tagarelli,et al.  Enhancing Single-Objective Projective Clustering Ensembles , 2010, 2010 IEEE International Conference on Data Mining.

[21]  Yunjun Gao,et al.  Hybrid clustering solution selection strategy , 2014, Pattern Recognit..

[22]  P.P. Bhattacharya Application of Artificial Neural Network in Cellular Handoff Management , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[23]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[24]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-Objective Clustering Ensemble , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[25]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Xiaoyi Jiang,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014, Pattern Recognit..

[27]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[28]  Hareton K. N. Leung,et al.  Hybrid $k$ -Nearest Neighbor Classifier , 2016, IEEE Transactions on Cybernetics.

[29]  Ricard Marxer,et al.  Dynamical Hierarchical Self-Organization of Harmonic, Motivic, and Pitch Categories , 2007, NIPS 2007.

[30]  Ohad Shamir,et al.  Cluster Stability for Finite Samples , 2007, NIPS.

[31]  Hamid Parvin,et al.  A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm , 2013, Pattern Analysis and Applications.

[32]  David G. Stork,et al.  Pattern Classification , 1973 .

[33]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[34]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Andrea Tagarelli,et al.  Projective clustering ensembles , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[36]  Ioannis T. Christou,et al.  Coordination of Cluster Ensembles via Exact Methods , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[38]  Ulrich Möller,et al.  Performance of data resampling methods for robust class discovery based on clustering , 2006, Intell. Data Anal..

[39]  Hamid Parvin,et al.  A clustering ensemble framework based on elite selection of weighted clusters , 2013, Adv. Data Anal. Classif..

[40]  Hamidah Ibrahim,et al.  A review: accuracy optimization in clustering ensembles using genetic algorithms , 2011, Artificial Intelligence Review.

[41]  Pengjiang Qian,et al.  Collaborative Fuzzy Clustering From Multiple Weighted Views , 2015, IEEE Transactions on Cybernetics.

[42]  R Baumgartner,et al.  Resampling as a cluster validation technique in fMRI , 2000, Journal of magnetic resonance imaging : JMRI.

[43]  Witold Pedrycz,et al.  Collaborative fuzzy clustering , 2002, Pattern Recognit. Lett..

[44]  Masayuki Mukunoki,et al.  Learning to Estimate Slide Comprehension in Classrooms with Support Vector Machines , 2012, IEEE Transactions on Learning Technologies.

[45]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[46]  Hamid Parvin,et al.  To improve the quality of cluster ensembles by selecting a subset of base clusters , 2014, J. Exp. Theor. Artif. Intell..

[47]  J. Sil,et al.  Cluster Validation Using Splitting and Merging Technique , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[48]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[49]  William F. Punch,et al.  Ensembles of partitions via data resampling , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[50]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[51]  Wai Lok Woo,et al.  Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches , 2013, Pattern Recognit..

[52]  Vladimir Estivill-Castro,et al.  Cluster Validity Using Support Vector Machines , 2003, DaWaK.

[53]  Daoqiang Zhang,et al.  Weighted Spectral Cluster Ensemble , 2015, 2015 IEEE International Conference on Data Mining.

[54]  P. Legendre,et al.  The generation of random ultrametric matrices representing dendrograms , 1991 .

[55]  Sadaaki Miyamoto,et al.  Kernelized Cluster Validity Measures and Application to Evaluation of Different Clustering Algorithms , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[56]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[57]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[58]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[59]  T. A. Buishand,et al.  Simulation of extreme precipitation in the Rhine basin by nearest-neighbour resampling , 1998 .

[60]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  J. Breckenridge Replicating Cluster Analysis: Method, Consistency, and Validity. , 1989, Multivariate behavioral research.

[62]  Daniel Hernández-Lobato,et al.  A Double Pruning Scheme for Boosting Ensembles , 2014, IEEE Transactions on Cybernetics.

[63]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[64]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[65]  Junjie Wu,et al.  Spectral Ensemble Clustering , 2015, KDD.

[66]  Ana L. N. Fred,et al.  Learning Pairwise Similarity for Data Clustering , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[67]  Daoqiang Zhang,et al.  WoCE: A framework for Clustering Ensemble by Exploiting the Wisdom of Crowds Theory , 2016, IEEE Transactions on Cybernetics.

[68]  Hosein Alizadeh,et al.  Hierarchical cluster ensemble selection , 2015, Eng. Appl. Artif. Intell..

[69]  William F. Punch,et al.  Effects of resampling method and adaptation on clustering ensemble efficacy , 2011, Artificial Intelligence Review.

[70]  Hamid Parvin,et al.  A New Criterion for Clusters Validation , 2011, EANN/AIAI.

[71]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  Jane You,et al.  Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[73]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[74]  Junjie Wu,et al.  Spectral Ensemble Clustering via Weighted K-Means: Theoretical and Practical Evidence , 2017, IEEE Transactions on Knowledge and Data Engineering.

[75]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[76]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[77]  Hamid Parvin,et al.  Using Clustering for Generating Diversity in Classifier Ensemble , 2009, J. Digit. Content Technol. its Appl..

[78]  Chongzhao Han,et al.  Rough set based cluster ensemble selection , 2013, Proceedings of the 16th International Conference on Information Fusion.

[79]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[80]  Ana L. N. Fred,et al.  Cluster Ensemble Methods: from Single Clusterings to Combined Solutions , 2008 .

[81]  Yan Yang,et al.  Selective Clustering Ensemble Based on Covariance , 2013, MCS.

[82]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[83]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[84]  Hamid Parvin,et al.  Cluster ensemble selection based on a new cluster stability measure , 2014, Intell. Data Anal..

[85]  G. A. Young,et al.  Recent Developments in Bootstrap Methodology , 2003 .

[86]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[87]  Josef Kittler,et al.  Multiple Classifier Systems , 2004, Lecture Notes in Computer Science.

[88]  Tossapon Boongoen,et al.  Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations , 2008, Discovery Science.

[89]  Muhammad Yousefnezhad,et al.  Wisdom of Crowds cluster ensemble , 2016, Intell. Data Anal..

[90]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Zhiwen Yu,et al.  Hybrid Adaptive Classifier Ensemble , 2015, IEEE Transactions on Cybernetics.

[92]  Jane You,et al.  Distribution-Based Cluster Structure Selection , 2017, IEEE Transactions on Cybernetics.

[93]  Johannes Wagner,et al.  Exploring Fusion Methods for Multimodal Emotion Recognition with Missing Data , 2011, IEEE Transactions on Affective Computing.

[94]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[95]  Hosein Alizadeh,et al.  A New Method for Improving the Performance of K Nearest Neighbor using Clustering Technique , 2009, J. Convergence Inf. Technol..

[96]  Eytan Domany,et al.  Resampling Method for Unsupervised Estimation of Cluster Validity , 2001, Neural Computation.

[97]  Nostrand Reinhold,et al.  the utility of using the genetic algorithm approach on the problem of Davis, L. (1991), Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. , 1991 .