A Novel Hierarchical-Clustering-Combination Scheme Based on Fuzzy-Similarity Relations

Clustering-combination methods have received considerable attentions in recent years, and many ensemble-based clustering methods have been introduced. However, clustering-combination techniques have been limited to ¿flat¿ clustering combination, and the combination of hierarchical clusterings has yet to be addressed. In this paper, we address and formalize the concept of hierarchical-clustering combination and introduce an algorithmic framework in which multiple hierarchical clusterings could be easily combined. In this framework, the similarity-based description matrices of input hierarchical clusterings are aggregated into a transitive consensus matrix in which the final hierarchy could be formed. Empirical evaluation, by using popular available datasets, confirms the superiority of combined hierarchical clustering introduced by our method over the standard (single) hierarchical-clustering methods.

[1]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[2]  Joydeep Ghosh,et al.  CONSENSUS-BASED ENSEMBLES OF SOFT CLUSTERINGS , 2008, MLMTA.

[3]  Vladimir Makarenkov,et al.  Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software , 2001, J. Classif..

[4]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[6]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[8]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[11]  Maurizio Vichi One-Mode Classification of a Three-Way Data Matrix , 1999 .

[12]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[13]  P. Green,et al.  A Generalized Rand-Index Method for Consensus Clustering of Separate Partitions of the Same Data Base , 1999 .

[14]  Thomas Hofmann,et al.  Non-redundant clustering with conditional ensembles , 2005, KDD '05.

[15]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[16]  Ute St. Clair,et al.  Fuzzy Set Theory: Foundations and Applications , 1997 .

[17]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Hsuan-Shih Lee An optimal algorithm for computing the max-min transitive closure of a fuzzy similarity matrix , 2001, Fuzzy Sets Syst..

[19]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[20]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[21]  François-Joseph Lapointe,et al.  Statistical Significance of the Matrix Correlation Coefficient for Comparing Independent Phylogenetic Trees , 1992 .

[22]  Witold Pedrycz,et al.  Collaborative fuzzy clustering , 2002, Pattern Recognit. Lett..

[23]  Andreas Stafylopatis,et al.  A clustering method based on boosting , 2004, Pattern Recognit. Lett..

[24]  William F. Punch,et al.  Ensembles of partitions via data resampling , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[25]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[26]  Maurizio Vichi,et al.  Fuzzy partition models for fitting a set of partitions , 2001 .

[27]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[28]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[29]  Witold Pedrycz,et al.  A consensus-driven fuzzy clustering , 2008, Pattern Recognit. Lett..

[30]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[31]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[32]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[33]  Ludmila I. Kuncheva,et al.  Experimental Comparison of Cluster Ensemble Methods , 2006, 2006 9th International Conference on Information Fusion.

[34]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[35]  Sukhamay Kundu,et al.  An optimal O(N2) algorithm for computing the min-transitive closure of a weighted graph , 2000, Inf. Process. Lett..

[36]  Ming-Yang Kao,et al.  On constructing an optimal consensus clustering from multiple clusterings , 2007, Inf. Process. Lett..

[37]  Bernard De Baets,et al.  On the min-transitive approximation of symmetric fuzzy relations , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[38]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[39]  Fu Guoyao,et al.  An algorithm for computing the transitive closure of a fuzzy similarity matrix , 1992 .

[40]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[41]  Yang Yu,et al.  Ensembling local learners ThroughMultimodal perturbation , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[42]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Yannis Avrithis,et al.  Computationally efficient sup-t transitive closure for sparse fuzzy binary relations , 2006, Fuzzy Sets Syst..

[44]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, ICPR 2004.

[45]  A. D. Gordon,et al.  Partitions of Partitions , 1998 .

[46]  Bernard De Baets,et al.  Algorithms for computing the min-transitive closure and associated partition tree of a symmetric fuzzy relation , 2004, Eur. J. Oper. Res..

[47]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[48]  Maurizio Vichi,et al.  Principal classifications analysis: a method for generating consensus dendrograms and its application to three-way data , 1998 .

[49]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[50]  J. Dunn Some Recent Investigations of a New Fuzzy Partitioning Algorithm and its Application to Pattern Classification Problems , 1974 .

[51]  Kurt Hornik,et al.  An Ensemble Method for Clustering , 2003 .

[52]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[53]  Majid Ahmadi,et al.  A new method for hierarchical clustering combination , 2008, Intell. Data Anal..

[54]  Michael Georgiopoulos,et al.  Boosted ARTMAP: Modifications to fuzzy ARTMAP motivated by boosting theory , 2006, Neural Networks.

[55]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[56]  Bernard De Baets,et al.  Algorithms for the computation of T-transitive closures , 2002, IEEE Trans. Fuzzy Syst..

[57]  János Podani Simulation of Random Dendrograms and Comparison Tests: Some Comments , 2000, J. Classif..

[58]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[59]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.