Co-Clustering Ensemble Based on Bilateral K-Means Algorithm

Clustering ensemble technique has been shown to be effective in improving the accuracy and stability of single clustering algorithms. With the development of information technology, the amount of data, such as image, text and video, has increased rapidly. Efficiently clustering these large-scale datasets is a challenge. Clustering ensembles usually transform clustering results to a co-association matrix, and then to a graph-partition problem. These methods may suffer from information loss when computing the similarity among samples or base clusterings. Rich information between samples and base clusterings is ignored. Moreover, the results are not discrete. They need post-processing steps to obtain the final clustering result, which will deviate greatly from the real clustering result. To address this problem, we propose a co-clustering ensemble based on bilateral k-means (CEBKM) algorithm. Our algorithm can simultaneously cluster samples and base clusterings of a dataset, to fully exploit the potential information between the samples and the base clusterings. In addition, it can directly obtain the final clustering results without using other clustering algorithms. The proposed method, outperformed several state-of-the-art clustering ensemble methods in experiments conducted on real-world and toy datasets.

[1]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[2]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[3]  Chang-Dong Wang,et al.  Robust Ensemble Clustering Using Probability Trajectories , 2016, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hamid Parvin,et al.  To improve the quality of cluster ensembles by selecting a subset of base clusters , 2014, J. Exp. Theor. Artif. Intell..

[5]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[6]  M. Cugmas,et al.  On comparing partitions , 2015 .

[7]  Muhammad Yousefnezhad,et al.  Wisdom of Crowds cluster ensemble , 2016, Intell. Data Anal..

[8]  Daoqiang Zhang,et al.  WoCE: A framework for Clustering Ensemble by Exploiting the Wisdom of Crowds Theory , 2016, IEEE Transactions on Cybernetics.

[9]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Hamid Parvin,et al.  A clustering ensemble framework based on elite selection of weighted clusters , 2013, Adv. Data Anal. Classif..

[11]  William F. Punch,et al.  Effects of resampling method and adaptation on clustering ensemble efficacy , 2011, Artificial Intelligence Review.

[12]  Chang-Dong Wang,et al.  Ultra-Scalable Spectral Clustering and Ensemble Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[13]  Hamid Parvin,et al.  Diversity based cluster weighting in cluster ensemble: an information theory approach , 2019, Artificial Intelligence Review.

[14]  Hamid Parvin,et al.  Clustering ensemble selection considering quality and diversity , 2015, Artificial Intelligence Review.

[15]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Chang-Dong Wang,et al.  Ensemble clustering using factor graph , 2016, Pattern Recognit..

[17]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[18]  Hamid Parvin,et al.  A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters , 2019, Applied intelligence (Boston).

[19]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[20]  Hamid Parvin,et al.  A comprehensive study of clustering ensemble weighting based on cluster quality and diversity , 2017, Pattern Analysis and Applications.

[21]  Minoru Sasaki,et al.  Ensemble document clustering using weighted hypergraph generated by NMF , 2007, ACL.

[22]  Jitender S. Deogun,et al.  Conceptual clustering in information retrieval , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[23]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  William F. Punch,et al.  Ensembles of partitions via data resampling , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[25]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[26]  Chang-Dong Wang,et al.  Enhanced Ensemble Clustering via Fast Propagation of Cluster-Wise Similarities , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[27]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[28]  William F. Punch,et al.  Data weighing mechanisms for clustering ensembles , 2013, Comput. Electr. Eng..

[29]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Sandro Vega-Pons,et al.  Weighted partition consensus via kernels , 2010, Pattern Recognit..

[31]  Xi Wang,et al.  Clustering aggregation by probability accumulation , 2009, Pattern Recognit..

[32]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[34]  J.A.F. Costa,et al.  Cluster analysis using self-organizing maps and image processing techniques , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[35]  Hamid Parvin,et al.  A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm , 2013, Pattern Analysis and Applications.

[36]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Carlotta Domeniconi,et al.  Weighted-object ensemble clustering: methods and analysis , 2016, Knowledge and Information Systems.

[38]  Hui Xiong,et al.  K-Means-Based Consensus Clustering: A Unified View , 2015, IEEE Transactions on Knowledge and Data Engineering.

[39]  Zhimao Lu,et al.  An Efficient Spectral Method for Document Cluster Ensemble , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[40]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[41]  Dingcheng Li,et al.  Spectral co-clustering ensemble , 2015, Knowl. Based Syst..

[42]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[43]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[44]  Hamid Parvin,et al.  A clustering ensemble learning method based on the ant colony clustering algorithm , 2017 .

[45]  Hamid Parvin,et al.  Cluster ensemble selection based on a new cluster stability measure , 2014, Intell. Data Anal..