Gene annotation and network inference by phylogenetic profiling

BackgroundPhylogenetic analysis is emerging as one of the most informative computational methods for the annotation of genes and identification of evolutionary modules of functionally related genes. The effectiveness with which phylogenetic profiles can be utilized to assign genes to pathways depends on an appropriate measure of correlation between gene profiles, and an effective decision rule to use the correlate. Current methods, though useful, perform at a level well below what is possible, largely because performance of the latter deteriorates rapidly as coverage increases.ResultsWe introduce, test and apply a new decision rule, correlation enrichment (CE), for assigning genes to functional categories at various levels of resolution. Among the results are: (1) CE performs better than standard guilt by association (SGA, assignment to a functional category when a simple correlate exceeds a pre-specified threshold) irrespective of the number of genes assigned (i.e. coverage); improvement is greatest at high coverage where precision (positive predictive value) of CE is approximately 6-fold higher than that of SGA. (2) CE is estimated to allocate each of the 2918 unannotated orthologs to KEGG pathways with an average precision of 49% (approximately 7-fold higher than SGA) (3) An estimated 94% of the 1846 unannotated orthologs in the COG ontology can be assigned a function with an average precision of 0.4 or greater. (4) Dozens of functional and evolutionarily conserved cliques or quasi-cliques can be identified, many having previously unannotated genes.ConclusionThe method serves as a general computational tool for annotating large numbers of unknown genes, uncovering evolutionary and functional modules. It appears to perform substantially better than extant stand alone high throughout methods.

[1]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[2]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  L. Aravind Guilt by association: contextual information in genome analysis. , 2000, Genome research.

[4]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[5]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[6]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[7]  Berend Snel,et al.  Quantifying modularity in the evolution of biomolecular systems. , 2004, Genome research.

[8]  C. DeLisi,et al.  The society of genes: networks of functional links between genes from comparative genomics , 2002, Genome Biology.

[9]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[10]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[11]  Jean-Michel Claverie,et al.  Phydbac "Gene Function Predictor" : a gene annotation tool based on genomic context analysis , 2005, BMC Bioinformatics.

[12]  Ram Samudrala,et al.  Enhanced functional information from predicted protein networks. , 2004, Trends in biotechnology.

[13]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[14]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[15]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[16]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[17]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[18]  Satoru Miyano,et al.  Estimating gene regulatory networks and protein-protein interactions of Saccharomyces cerevisiae from multiple genome-wide data , 2005, ECCB/JBI.

[19]  Anton J. Enright,et al.  Human MicroRNA Targets , 2004, PLoS biology.

[20]  Edward M Marcotte,et al.  Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages , 2003, Nature Biotechnology.

[21]  Simon Kasif,et al.  Identification of functional links between genes using phylogenetic profiles , 2003, Bioinform..

[22]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[24]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[25]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[26]  Matteo Pellegrini,et al.  Detection of parallel functional modules by comparative analysis of genome sequences , 2005, Nature Biotechnology.

[27]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[29]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  P. Bork,et al.  Genome evolution reveals biochemical networks and functional modules , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[32]  P. Uetz,et al.  Towards an understanding of complex protein networks. , 2001, Trends in cell biology.

[33]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[34]  Edward M. Marcotte,et al.  Protein function prediction using the Protein Link EXplorer (PLEX) , 2005, Bioinform..

[35]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[36]  Ting Wang,et al.  Combining phylogenetic data with co-regulated genes to identify regulatory motifs , 2003, Bioinform..

[37]  Satoru Miyano,et al.  Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection , 2003, ECCB.

[38]  B. Snel,et al.  Predicting gene function by conserved co-expression. , 2003, Trends in genetics : TIG.

[39]  S. Dwight,et al.  Predicting gene function from patterns of annotation. , 2003, Genome research.

[40]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[41]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[42]  T. Gaasterland,et al.  Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. , 1998, Microbial & comparative genomics.