A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters

Abstract Cluster ensemble is an important research content of ensemble learning, which is used to aggregate several base clusterings to generate a single output clustering with improved robustness and quality. Since clustering is unsupervised, where the “accuracy” does not have a clear meaning, most of existing ensemble methods try to obtain the most consistent clustering result with base clusterings. However, it is difficult for these methods to realize “Multi-weaks equal to a Strong”. For example, on a data set with nonlinearly separable clusters, if the base clusterings are produced by some linear clusterers, these methods generally cannot integrate them to obtain a good nonlinear clustering. In this paper, we select k-means as a base clusterer and provide an ensemble clusterer (algorithm) of multiple k-means clusterings based on a local hypothesis. In the new algorithm, we study the extraction of the local-credible labels from a base clustering, the production of different base clusterings, the construction of cluster relation and the final assignment of each object. The proposed ensemble clusterer not only inherits the scalability of k-means but also overcomes its limitation that it only can find linearly separable clusters. Finally, the experimental results illustrate its effectiveness and efficiency.

[1]  Yun Yang,et al.  Hybrid Sampling-Based Clustering Ensemble With Global and Local Constitutions , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[3]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[6]  Chang-Dong Wang,et al.  Robust Ensemble Clustering Using Probability Trajectories , 2016, IEEE Transactions on Knowledge and Data Engineering.

[7]  Tossapon Boongoen,et al.  Comparative study of matrix refinement approaches for ensemble clustering , 2013, Machine Learning.

[8]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[9]  Mari Ostendorf,et al.  Combining Multiple Clustering Systems , 2004, PKDD.

[10]  Jane You,et al.  Semi-Supervised Ensemble Clustering Based on Selected Constraint Projection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Jane You,et al.  Adaptive Ensembling of Semi-Supervised Clustering Solutions , 2017, IEEE Transactions on Knowledge and Data Engineering.

[13]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Chang-Dong Wang,et al.  Enhanced Ensemble Clustering via Fast Propagation of Cluster-Wise Similarities , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[17]  Zengyou He,et al.  A cluster ensemble method for clustering categorical data , 2005, Information Fusion.

[18]  Ertunc Erdil,et al.  Combining multiple clusterings using similarity graph , 2011, Pattern Recognit..

[19]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[20]  Lei Shi,et al.  Learning a Robust Consensus Matrix for Clustering Ensemble via Kullback-Leibler Divergence Minimization , 2015, IJCAI.

[21]  Andrea Tagarelli,et al.  Metacluster-based Projective Clustering Ensembles , 2013, Machine Learning.

[22]  Jordi Turmo,et al.  Unsupervised ensemble minority clustering , 2013, Machine Learning.

[23]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[24]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[26]  Jane You,et al.  Distribution-Based Cluster Structure Selection , 2017, IEEE Transactions on Cybernetics.

[27]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[29]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[30]  Yi Hong,et al.  Resampling-based selective clustering ensembles , 2009, Pattern Recognit. Lett..

[31]  Rajendra Akerkar,et al.  Knowledge Based Systems , 2017, Encyclopedia of GIS.

[32]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[33]  Fang Liu,et al.  Spectral Clustering Ensemble Applied to SAR Image Segmentation , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[35]  Philip S. Yu,et al.  Combining multiple clusterings by soft correspondence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[36]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[37]  Jane You,et al.  Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[39]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2009, Stat. Anal. Data Min..

[40]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[41]  Dan A. Simovici,et al.  Finding Median Partitions Using Information-Theoretical-Based Genetic Algorithms , 2002, J. Univers. Comput. Sci..

[42]  Chang-Dong Wang,et al.  Ultra-Scalable Spectral Clustering and Ensemble Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[43]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[44]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[45]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[46]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[47]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[48]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Yun Yang,et al.  Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations , 2011, IEEE Transactions on Knowledge and Data Engineering.

[50]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Zhiwen Yu,et al.  Adaptive Noise Immune Cluster Ensemble Using Affinity Propagation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[52]  Hui Xiong,et al.  K-means clustering versus validation measures: a data distribution perspective , 2006, KDD '06.

[53]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[54]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[55]  Yike Guo,et al.  An Information-Theoretical Framework for Cluster Ensemble , 2019, IEEE Transactions on Knowledge and Data Engineering.

[56]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[57]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[58]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[59]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.