Enhanced Ensemble Clustering via Fast Propagation of Cluster-Wise Similarities

Ensemble clustering has been a popular research topic in data mining and machine learning. Despite its significant progress in recent years, there are still two challenging issues in the current ensemble clustering research. First, most of the existing algorithms tend to investigate the ensemble information at the object-level, yet often lack the ability to explore the rich information at higher levels of granularity. Second, they mostly focus on the direct connections (e.g., direct intersection or pair-wise co-occurrence) in the multiple base clusterings, but generally neglect the multiscale indirect relationship hidden in them. To address these two issues, this paper presents a novel ensemble clustering approach based on fast propagation of cluster-wise similarities via random walks. We first construct a cluster similarity graph with the base clusters treated as graph nodes and the cluster-wise Jaccard coefficient exploited to compute the initial edge weights. Upon the constructed graph, a transition probability matrix is defined, based on which the random walk process is conducted to propagate the graph structural information. Specifically, by investigating the propagating trajectories starting from different nodes, a new cluster-wise similarity matrix can be derived by considering the trajectory relationship. Then, the newly obtained cluster-wise similarity matrix is mapped from the cluster-level to the object-level to achieve an enhanced co-association matrix, which is able to simultaneously capture the object-wise co-occurrence relationship as well as the multiscale cluster-wise relationship in ensembles. Finally, two novel consensus functions are proposed to obtain the consensus clustering result. Extensive experiments on a variety of real-world datasets have demonstrated the effectiveness and efficiency of our approach.

[1]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Zhu Wang,et al.  Discovering and Profiling Overlapping Communities in Location-Based Social Networks , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  Yunjun Gao,et al.  Hybrid clustering solution selection strategy , 2014, Pattern Recognit..

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[7]  Hongtao Lu,et al.  Enhanced modularity-based community detection by random walk network preprocessing. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Frank Plastria,et al.  On the point for which the sum of the distances to n given points is minimum , 2009, Ann. Oper. Res..

[10]  Tossapon Boongoen,et al.  Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations , 2008, Discovery Science.

[11]  Carlotta Domeniconi,et al.  Weighted-object ensemble clustering: methods and analysis , 2016, Knowledge and Information Systems.

[12]  Chang-Dong Wang,et al.  LWMC: A Locally Weighted Meta-Clustering Algorithm for Ensemble Clustering , 2017, ICONIP.

[13]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[14]  Philip S. Yu,et al.  NEIWalk: Community Discovery in Dynamic Content-Based Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[16]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[17]  Chang-Dong Wang,et al.  Ensembling over-segmentations: From weak evidence to strong segmentation , 2016, Neurocomputing.

[18]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[20]  Jane You,et al.  Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Petros Daras,et al.  The TFC Model: Tensor Factorization and Tag Clustering for Item Recommendation in Social Tagging Systems , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[22]  Naixue Xiong,et al.  DHeat: A Density Heat-Based Algorithm for Clustering With Effective Radius , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[23]  Chang-Dong Wang,et al.  SVStream: A Support Vector-Based Algorithm for Clustering Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[24]  Chang-Dong Wang,et al.  Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis , 2014, Neurocomputing.

[25]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[26]  Jian-Huang Lai,et al.  Euler Clustering on Large-Scale Dataset , 2018, IEEE Transactions on Big Data.

[27]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[28]  Ertunc Erdil,et al.  Combining multiple clusterings using similarity graph , 2011, Pattern Recognit..

[29]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[30]  Hong He,et al.  Pattern Clustering of Hysteresis Time Series With Multivalued Mapping Using Tensor Decomposition , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[31]  Jinfeng Yi,et al.  Robust Ensemble Clustering by Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[32]  Chang-Dong Wang,et al.  Robust Ensemble Clustering Using Probability Trajectories , 2016, IEEE Transactions on Knowledge and Data Engineering.

[33]  Panagiotis Symeonidis,et al.  ClustHOSVD: Item Recommendation by Combining Semantically Enhanced Tag Clustering With Tensor HOSVD , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[34]  Carlotta Domeniconi,et al.  Weighted-Object Ensemble Clustering , 2013, 2013 IEEE 13th International Conference on Data Mining.

[35]  Zhiwen Yu,et al.  Adaptive Noise Immune Cluster Ensemble Using Affinity Propagation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[36]  Satnam Singh,et al.  An Ontology-Based Text Mining Method to Develop D-Matrix From Unstructured Text , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[37]  Xiaoyi Jiang,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014, Pattern Recognit..

[38]  Shitong Wang,et al.  Fast Reduced Set-Based Exemplar Finding and Cluster Assignment , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[39]  Jane You,et al.  Adaptive Ensembling of Semi-Supervised Clustering Solutions , 2017, IEEE Transactions on Knowledge and Data Engineering.

[40]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[41]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[42]  Chang-Dong Wang,et al.  Multi-Exemplar Affinity Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Philip S. Yu,et al.  Multi-View Clustering Based on Belief Propagation , 2016, IEEE Transactions on Knowledge and Data Engineering.

[44]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[45]  Jian Yu,et al.  Clustering Ensembles Based on Normalized Edges , 2007, PAKDD.

[46]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Yang Yang,et al.  Multitask Spectral Clustering by Exploring Intertask Correlation , 2015, IEEE Transactions on Cybernetics.

[48]  JiangXiaoyi,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014 .

[49]  Yun Fu,et al.  Entropy‐based consensus clustering for patient stratification , 2017, Bioinform..

[50]  Chang-Dong Wang,et al.  Ensemble clustering using factor graph , 2016, Pattern Recognit..

[51]  Junjie Wu,et al.  Spectral Ensemble Clustering via Weighted K-Means: Theoretical and Practical Evidence , 2017, IEEE Transactions on Knowledge and Data Engineering.

[52]  Hui Xiong,et al.  K-Means-Based Consensus Clustering: A Unified View , 2015, IEEE Transactions on Knowledge and Data Engineering.

[53]  MengChu Zhou,et al.  A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence , 2016, Knowl. Based Syst..

[54]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[55]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[56]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[57]  Jane You,et al.  Distribution-Based Cluster Structure Selection , 2017, IEEE Transactions on Cybernetics.

[58]  MengChu Zhou,et al.  A Novel Method on Information Recommendation via Hybrid Similarity , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[59]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.