Performance of an Ensemble Clustering Algorithm on Biological Data Sets

Ensemble clustering is a promising approach that combines the results of multiple clustering algorithms to obtain a consensus partition by merging different partitions based upon well-defined rules. In this study, we use an ensemble clustering approach for merging the results of five different clustering algorithms that are sometimes used in bioinformatics applications. The ensemble clustering result is tested on microarray data sets and compared with the results of the individual algorithms. An external cluster validation index, adjusted rand index (C-rand), and two internal cluster validation indices; silhouette, and modularity are used for comparison purposes.

[1]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[2]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Burak Eksioglu,et al.  Performance evaluation of a community structure finding algorithm using modularity and C-rand measures , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[4]  Roberto Avogadri,et al.  Fuzzy ensemble clustering based on random projections for DNA microarray data analysis , 2009, Artif. Intell. Medicine.

[5]  Tsaipei Wang Comparing hard and fuzzy c-means for evidence-accumulation clustering , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[8]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  MaulikUjjwal,et al.  An improved algorithm for clustering gene expression data , 2007 .

[10]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[11]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Investigation of a new GRASP-based clustering algorithm applied to biological data , 2010, Comput. Oper. Res..

[12]  L. Hubert,et al.  Comparing partitions , 1985 .

[13]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[14]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[15]  Srinivasan Parthasarathy,et al.  An Ensemble Approach for Clustering Scale›Free Graphs , 2006 .

[16]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[17]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  P. Comon,et al.  Combining multiple partitions created with a graph-based construction for data clustering , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[19]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[20]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[21]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[22]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[23]  Xiaohua Hu,et al.  Cluster Ensemble and Its Applications in Gene Expression Analysis , 2004, APBC.