A stability-based algorithm to validate hierarchical clusters of genes

Stability-based methods have been successfully applied in functional genomics to the analysis of the reliability of clusterings characterised by a relatively low number of examples and clusters. The application of these methods to the validation of gene clusters discovered in biomolecular data may lead to computational problems due to the large amount of possible clusters involved. To address this problem, we present a stability-based algorithm to discover significant clusters in hierarchical clusterings with a large number of examples and clusters. The reliability of clusters of genes discovered in gene expression data of patients affected by human myeloid leukaemia is analysed through the proposed algorithm, and their relationships with specific biological processes are tested by means of Gene Ontology-based functional enrichment methods.

[1]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  H. K. Kim,et al.  Gene expression signatures associated with the resistance to imatinib , 2006, Leukemia.

[3]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[4]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[5]  Konstantinos Konstantopoulos,et al.  Application of microarrays to identify and characterize genes involved in attachment dependence in HeLa cells. , 2007, Metabolic engineering.

[6]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[7]  Debashis Ghosh,et al.  Cluster stability scores for microarray data in cancer studies , 2003, BMC Bioinformatics.

[8]  Richard M. Simon,et al.  Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data , 2002, Bioinform..

[9]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[10]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[11]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[12]  Giorgio Valentini,et al.  Model order selection for bio-molecular data clustering , 2007, BMC Bioinformatics.

[13]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[14]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[15]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[16]  Giorgio Valentini,et al.  Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses , 2006, Artif. Intell. Medicine.

[17]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Antonino Staiano,et al.  Clustering and visualization approaches for human cell cycle gene expression data analysis , 2008, Int. J. Approx. Reason..

[19]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[20]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[21]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[22]  J. Dopazo Functional interpretation of microarray experiments. , 2006, Omics : a journal of integrative biology.

[23]  Francisco Azuaje,et al.  An integrated tool for microarray data clustering and cluster validity assessment , 2004, SAC '04.

[24]  Thomas Christen,et al.  Connexin37 protects against atherosclerosis by regulating monocyte adhesion , 2006, Nature Medicine.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  Giorgio Valentini,et al.  Discovering multi–level structures in bio-molecular data through the Bernstein inequality , 2008, BMC Bioinformatics.