Optimization of Basic Clustering for Ensemble Clustering: An Information-Theoretic Perspective

The current research on ensemble clustering mainly focuses on integration strategies, but the attention regarding the measurement and optimization of basic cluster is less emphasized. Based on the information entropy theory, this paper proposes a quality metric of basic cluster, and the clusterings are further selected by incorporating two-branch decisions and three-way decisions respectively. Determined by preset threshold(s), mechanism of two-branch based basic clustering filtering (BCF2BD) and three-way based basic clustering filtering (BCF3WD) are developed. Concretely, the basic clustering in BCF2BD is deleted if the quality metric of it is less than the preset threshold <inline-formula> <tex-math notation="LaTeX">$\xi $ </tex-math></inline-formula>, and the new clustering member is added to maintain the basic cluster set count. The basic clustering in BCF3WD is deleted if the quality metric of it is less than the preset threshold <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula>, retained if the quality metric of it is greater than the preset threshold <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>, recalculated if the quality metric of it is greater than <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula> and less than <inline-formula> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>. Both mechanism executed repeatedly until either non-decrement of basic clusters occurred or maximum iteration count reached. Contrastive experiments show that both methods of filtering algorithms can effectively improve the performance of ensemble clustering, and the three-way decisions filtering algorithm get less time consumption.

[1]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[2]  Hamid Parvin,et al.  A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm , 2013, Pattern Analysis and Applications.

[3]  Hamid Parvin,et al.  Elite fuzzy clustering ensemble based on clustering diversity and quality measures , 2018, Applied Intelligence.

[4]  Yun Yang,et al.  Hybrid Sampling-Based Clustering Ensemble With Global and Local Constitutions , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Hamid Parvin,et al.  A comprehensive study of clustering ensemble weighting based on cluster quality and diversity , 2017, Pattern Analysis and Applications.

[6]  Chang-Dong Wang,et al.  Enhanced Ensemble Clustering via Fast Propagation of Cluster-Wise Similarities , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[7]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[8]  Chang-Dong Wang,et al.  Robust Ensemble Clustering Using Probability Trajectories , 2016, IEEE Transactions on Knowledge and Data Engineering.

[9]  Kagan Tumer,et al.  Ensemble clustering with voting active clusters , 2008, Pattern Recognit. Lett..

[10]  Hamid Parvin,et al.  Consensus Function Based on Clusters Clustering and Iterative Fusion of Base Clusters , 2019, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[12]  Yiyu Yao,et al.  Three-way decisions with probabilistic rough sets , 2010, Inf. Sci..

[13]  Hamid Parvin,et al.  Cluster ensemble selection based on a new cluster stability measure , 2014, Intell. Data Anal..

[14]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[15]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[16]  Jinfeng Yi,et al.  Robust Ensemble Clustering by Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[17]  Yun Fu,et al.  Robust Spectral Ensemble Clustering , 2016, CIKM.

[18]  Tsaipei Wang,et al.  CA-Tree: A Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Chang-Dong Wang,et al.  Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis , 2014, Neurocomputing.

[20]  Xi Wang,et al.  Clustering aggregation by probability accumulation , 2009, Pattern Recognit..

[21]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[22]  Jian Yu,et al.  Clustering Ensembles Based on Normalized Edges , 2007, PAKDD.

[23]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[24]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[25]  Caiming Zhong,et al.  Cluster Ensemble Based on Iteratively Refined Co-Association Matrix , 2018, IEEE Access.

[26]  Lei Shi,et al.  Learning a Robust Consensus Matrix for Clustering Ensemble via Kullback-Leibler Divergence Minimization , 2015, IJCAI.

[27]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[28]  Jianfeng Xu,et al.  A three-way selective ensemble model for multi-label classification , 2018, Int. J. Approx. Reason..

[29]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Hamid Parvin,et al.  Optimizing Fuzzy Cluster Ensemble in String Representation , 2013, Int. J. Pattern Recognit. Artif. Intell..

[31]  Hamid Parvin,et al.  A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters , 2019, Applied intelligence (Boston).

[32]  William F. Punch,et al.  Effects of resampling method and adaptation on clustering ensemble efficacy , 2011, Artificial Intelligence Review.

[33]  Zhiwen Yu,et al.  Adaptive noise immune cluster ensemble using affinity propagation , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[34]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[35]  Zhifei Zhang,et al.  A three-way decisions model with probabilistic rough sets for stream computing , 2017, Int. J. Approx. Reason..

[36]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Yiyu Yao,et al.  Class-specific attribute reducts in rough set theory , 2017, Inf. Sci..

[38]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[39]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[40]  William F. Punch,et al.  Ensembles of partitions via data resampling , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[41]  Li-min Liu,et al.  A Weighted Cluster Ensemble Algorithm Based on Graph , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[42]  Chang-Dong Wang,et al.  Ultra-Scalable Spectral Clustering and Ensemble Clustering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[43]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[44]  Zdzislaw Pawlak,et al.  Rough Set Theory and its Applications to Data Analysis , 1998, Cybern. Syst..

[45]  Natthakan Iam-On,et al.  LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles , 2010 .

[46]  Chang-Dong Wang,et al.  Ensemble clustering using factor graph , 2016, Pattern Recognit..

[47]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[48]  Daoqiang Zhang,et al.  A new selection strategy for selective cluster ensemble based on Diversity and Independency , 2016, Eng. Appl. Artif. Intell..

[49]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  William F. Punch,et al.  Data weighing mechanisms for clustering ensembles , 2013, Comput. Electr. Eng..

[51]  Ana L. N. Fred,et al.  Probabilistic consensus clustering using evidence accumulation , 2013, Machine Learning.

[52]  Hamid Parvin,et al.  A clustering ensemble framework based on elite selection of weighted clusters , 2013, Adv. Data Anal. Classif..