Robust Clustering by Aggregation and Intersection Methods

When dealing with multiple clustering solutions, the problem of extrapolating a small number of good different solutions becomes crucial. This problem is faced by the so called Meta Clustering [12], that produces clusters of clustering solutions. Often such groups, called meta-clusters, represent alternative ways of grouping the original data. The next step is to construct a clustering which represents a chosen meta-cluster. In this work, starting from a population of solutions, we build meta-clusters by hierarchical agglomerative approach with respect to an entropy-based similarity measure. The selection of the threshold value is controlled by the user through interactive visualizations. When the meta-cluster is selected, the representative clustering is constructed following two different consensus approaches. The process is illustrated through a synthetic dataset.

[1]  Antonino Staiano,et al.  A multi-step approach to time series analysis and gene expression clustering , 2006, Bioinform..

[2]  Olli Nevalainen,et al.  Reallocation of GLA codevectors for evading local minimum , 1996 .

[3]  Michele Pinelli,et al.  Interactive data analysis and clustering of genomic data , 2008, Neural Networks.

[4]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[5]  A. Bertoni,et al.  Random projections for assessing gene expression cluster stability , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[6]  Antonino Staiano,et al.  Clustering and visualization approaches for human cell cycle gene expression data analysis , 2008, Int. J. Approx. Reason..

[7]  Nabil H. Mustafa,et al.  k-means projective clustering , 2004, PODS.

[8]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[9]  Rich Caruana,et al.  Meta Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Jean-Pierre Barthélemy,et al.  The Median Procedure for Partitions , 1993, Partitioning Data Sets.

[11]  Giorgio Valentini,et al.  Characterization of lung tumor subtypes through gene expression cluster validity assessment , 2006, RAIRO Theor. Informatics Appl..

[12]  Sam Yuan Sung,et al.  Consensus clustering , 2005, Intell. Data Anal..

[13]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Petra Perner,et al.  Advances in Data Mining , 2002, Lecture Notes in Computer Science.

[17]  Anthony Wirth,et al.  Are approximation algorithms for consensus clustering worthwhile? , 2007, SDM.

[18]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, ICPR 2004.

[19]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Y. P. Hu,et al.  Global optimization in clustering using hyperbolic cross points , 2007, Pattern Recognit..

[21]  Ming-Yang Kao,et al.  On constructing an optimal consensus clustering from multiple clusterings , 2007, Inf. Process. Lett..

[22]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Francesco Napolitano,et al.  Using Global Optimization to Explore Multiple Solutions of Clustering Problems , 2008, KES.