Ensemble Clustering with Novel Weighting Strategy

The target of ensemble clustering is to improve the accuracy of clustering by integrating multiple clustering results and solve the problem of scalability existed in traditional and single clustering algorithms. In recent years, ensemble clustering has attracted increasing attention due to its remarkable achievements. However, the limitation of most existing ensemble clustering approaches is that all base clusterings are treated equally without considering the validity of them. Some ensemble clustering algorithms are aware of using weighting strategy but also ignoring the negative impact of base clusterings with poor performance. In this paper, we propose an ensemble clustering method based on a novel weighting strategy. Specifically, the validity of each base clustering is measured by the optimal matching score between the base clustering and the whole to obtain the corresponding weight. Then, the weights of base clusterings which have negative contribution are further adjusted to get the final weight vector. Subsequently, a weighted co-association matrix is constructed to serve as the ensemble matrix and a hierarchical clustering algorithm is applied to it to generate the final result. Experimental results on different types of real-world datasets show the superiority of proposed methods.

[1]  Hong Jia,et al.  Unsupervised Feature Selection with Feature Clustering , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[2]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[3]  Dan A. Simovici,et al.  Finding Median Partitions Using Information-Theoretical-Based Genetic Algorithms , 2002, J. Univers. Comput. Sci..

[4]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Chang-Dong Wang,et al.  Robust Ensemble Clustering Using Probability Trajectories , 2016, IEEE Transactions on Knowledge and Data Engineering.

[6]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[7]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[8]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[9]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Jane You,et al.  Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[13]  Xi Wang,et al.  Clustering aggregation by probability accumulation , 2009, Pattern Recognit..

[14]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[15]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xiaoyi Jiang,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014, Pattern Recognit..

[17]  Delbert Dueck,et al.  Affinity Propagation: Clustering Data by Passing Messages , 2009 .

[18]  Jane You,et al.  Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Frank Plastria,et al.  On the point for which the sum of the distances to n given points is minimum , 2009, Ann. Oper. Res..

[20]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[21]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[22]  Dacheng Tao,et al.  SCE: A Manifold Regularized Set-Covering Method for Data Partitioning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[23]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..