A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification

BackgroundClassification procedures are widely used in phylogenetic inference, the analysis of expression profiles, the study of biological networks, etc. Many algorithms have been proposed to establish the similarity between two different classifications of the same elements. However, methods to determine significant coincidences between hierarchical and non-hierarchical partitions are still poorly developed, in spite of the fact that the search for such coincidences is implicit in many analyses of massive data.ResultsWe describe a novel strategy to compare a hierarchical and a dichotomic non-hierarchical classification of elements, in order to find clusters in a hierarchical tree in which elements of a given "flat" partition are overrepresented. The key improvement of our strategy respect to previous methods is using permutation analyses of ranked clusters to determine whether regions of the dendrograms present a significant enrichment. We show that this method is more sensitive than previously developed strategies and how it can be applied to several real cases, including microarray and interactome data. Particularly, we use it to compare a hierarchical representation of the yeast mitochondrial interactome and a catalogue of known mitochondrial protein complexes, demonstrating a high level of congruence between those two classifications. We also discuss extensions of this method to other cases which are conceptually related.ConclusionOur method is highly sensitive and outperforms previously described strategies. A PERL script that implements it is available at http://www.uv.es/~genomica/treetracker.

[1]  Rainer Breitling,et al.  Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments , 2004, BMC Bioinformatics.

[2]  Ignacio Marín,et al.  UVPAR: fast detection of functional shifts in duplicate genes , 2006, BMC Bioinformatics.

[3]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[4]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[5]  Ignacio Marín,et al.  Comparative genomics and protein domain graph analyses link ubiquitination and RNA metabolism. , 2006, Journal of molecular biology.

[6]  John N. Weinstein,et al.  High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID) , 2005, BMC Bioinformatics.

[7]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Joaquín Dopazo,et al.  Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information , 2005, Bioinform..

[10]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[11]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[12]  Patrik Edén,et al.  Comparing Functional Annotation Analyses with Catmap Comparing Functional Annotation Analyses with Catmap , 2004 .

[13]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[14]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[15]  Albert Sickmann,et al.  Mitochondrial Presequence Translocase: Switching between TOM Tethering and Motor Recruitment Involves Tim21 and Tim17 , 2005, Cell.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  J. Dopazo Functional interpretation of microarray experiments. , 2006, Omics : a journal of integrative biology.

[18]  N. Pfanner,et al.  Versatility of the mitochondrial protein import machinery , 2001, Nature Reviews Molecular Cell Biology.

[19]  Lyle H. Ungar,et al.  The CRASSS plug-in for integrating annotation data with hierarchical clustering results , 2004, Bioinform..

[20]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[21]  Claude Pasquier,et al.  THEA: ontology-driven analysis of microarray data , 2004, Bioinform..

[22]  Christiane Lohaus,et al.  The mitochondrial morphology protein Mdm10 functions in assembly of the preprotein translocase of the outer membrane. , 2004, Developmental cell.

[23]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[24]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[25]  F W McLafferty,et al.  Biosynthesis of the thiazole moiety of thiamin in Escherichia coli: Identification of an acyldisulfide-linked protein–protein conjugate that is functionally analogous to the ubiquitin/E1 complex , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[27]  P. Kemmeren,et al.  Protein interaction verification and functional annotation by integrated analysis of genome-scale data. , 2002, Molecular cell.

[28]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[29]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[30]  Petri Törönen,et al.  Selection of informative clusters from hierarchical cluster tree with gene classes , 2004, BMC Bioinformatics.

[31]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[32]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[33]  Albert Sickmann,et al.  Proteomic analysis of the yeast mitochondrial outer membrane reveals accumulation of a subclass of preproteins. , 2005, Molecular biology of the cell.

[34]  G. Agrimi,et al.  Identification and functional reconstitution of yeast mitochondrial carrier for S‐adenosylmethionine , 2003, The EMBO journal.

[35]  Aurora Torrente,et al.  A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings , 2005, Bioinform..

[36]  Andreas Rudolph,et al.  Techniques of Cluster Algorithms in Data Mining , 2002, Data Mining and Knowledge Discovery.

[37]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.