A cross-genomic approach for systematic mapping of phenotypic traits to genes.

We present a computational method for de novo identification of gene function using only cross-organismal distribution of phenotypic traits. Our approach assumes that proteins necessary for a set of phenotypic traits are preferentially conserved among organisms that share those traits. This method combines organism-to-phenotype associations,along with phylogenetic profiles,to identify proteins that have high propensities for the query phenotype; it does not require the use of any functional annotations for any proteins. We first present the statistical foundations of this approach and then apply it to a range of phenotypes to assess how its performance depends on the frequency and specificity of the phenotype. Our analysis shows that statistically significant associations are possible as long as the phenotype is neither extremely rare nor extremely common; results on the flagella,pili, thermophily,and respiratory tract tropism phenotypes suggest that reliable associations can be inferred when the phenotype does not arise from many alternate mechanisms.

[1]  Jian Wang,et al.  A complete sequence of the T. tengcongensis genome. , 2002, Genome research.

[2]  G Waksman,et al.  Chaperone-assisted pilus assembly and bacterial attachment. , 2000, Current opinion in structural biology.

[3]  Robert M. Kelly,et al.  Metabolism in hyperthermophilic microorganisms , 2004, Antonie van Leeuwenhoek.

[4]  H Philippe,et al.  Reverse gyrase from hyperthermophiles: probable transfer of a thermoadaptation trait from archaea to bacteria. , 2000, Trends in genetics : TIG.

[5]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[6]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[7]  Maria J Martin,et al.  Comparing bacterial genomes through conservation profiles. , 2003, Genome research.

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[10]  Rupert G. Miller Simultaneous Statistical Inference , 1966 .

[11]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[12]  T Gaasterland,et al.  Constructing multigenome views of whole microbial genomes. , 1998, Microbial & comparative genomics.

[13]  J. W. Chase,et al.  A Survey of the Genome of the Hyperthermophilic Archaeon, Pyrococcus furiosus , 1996 .

[14]  Dennis Shasha,et al.  Trait-to-Gene A Computational Method for Predicting the Function of Uncharacterized Genes , 2003, Current Biology.

[15]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J. Tamames,et al.  Bringing gene order into bacterial shape. , 2001, Trends in genetics : TIG.

[18]  D. Lovley,et al.  Extending the Upper Temperature Limit for Life , 2003, Science.

[19]  B. Snel,et al.  Genome evolution. Gene fusion versus gene fission. , 2000, Trends in Genetics.

[20]  D. Grogan Hyperthermophiles and the problem of DNA instability , 1998, Molecular microbiology.

[21]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.