A Semi-Quantitative, Synteny-Based Method to Improve Functional Predictions for Hypothetical and Poorly Annotated Bacterial and Archaeal Genes

During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application.

[1]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  James C Liao,et al.  Reconstruction of the archaeal isoprenoid ether lipid biosynthesis pathway in Escherichia coli through digeranylgeranylglyceryl phosphate. , 2009, Metabolic engineering.

[3]  B. Snel,et al.  Gene and context: integrative approaches to genome analysis. , 2000, Advances in protein chemistry.

[4]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[5]  Konstantinos T. Konstantinidis,et al.  Towards a Genome-Based Taxonomy for Prokaryotes , 2005, Journal of bacteriology.

[6]  M. Kanehisa,et al.  Computation with the KEGG pathway database. , 1998, Bio Systems.

[7]  D. Virok,et al.  Generation of targeted Chlamydia trachomatis null mutants , 2011, Proceedings of the National Academy of Sciences.

[8]  Igor B. Rogozin,et al.  Computational approaches for the analysis of gene neighbourhoods in prokaryotic genomes , 2004, Briefings Bioinform..

[9]  Lawrence Hunter,et al.  Predicting protein linkages in bacteria: Which method is best depends on task , 2008, BMC Bioinformatics.

[10]  P. Bork,et al.  Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs , 2004, Nature Biotechnology.

[11]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[13]  Jillian F. Banfield,et al.  Genome dynamics in a natural archaeal population , 2007, Proceedings of the National Academy of Sciences.

[14]  P. Bork,et al.  Prediction of effective genome size in metagenomic samples , 2007, Genome Biology.

[15]  N. Grishin,et al.  A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action , 2006, Biology Direct.

[16]  Guenter Schwarz,et al.  Function of MoaB proteins in the biosynthesis of the molybdenum and tungsten cofactors. , 2008, Biochemistry.

[17]  J. Banfield,et al.  An archaeal iron-oxidizing extreme acidophile important in acid mine drainage. , 2000, Science.

[18]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[19]  W. Maddison,et al.  Testing character correlation using pairwise comparisons on a phylogeny. , 2000, Journal of theoretical biology.

[20]  E. Rocha Inference and analysis of the relative stability of bacterial chromosomes. , 2006, Molecular biology and evolution.

[21]  Shiraz A. Shah,et al.  CRISPR/Cas and Cmr modules, mobility and evolution of adaptive immune systems. , 2011, Research in microbiology.

[22]  Natalia Ivanova,et al.  The ERGOTM genome analysis and discovery system , 2003, Nucleic Acids Res..

[23]  M. Suyama,et al.  Evolution of prokaryotic gene order: genome rearrangements in closely related species. , 2001, Trends in genetics : TIG.

[24]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[25]  Brian C. Thomas,et al.  Community-wide analysis of microbial genome sequence signatures , 2009, Genome Biology.

[26]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[27]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[28]  J. Escalante‐Semerena,et al.  CbiZ, an amidohydrolase enzyme required for salvaging the coenzyme B12 precursor cobinamide in archaea. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Hirotada Mori,et al.  General Enzymatic Screens Identify Three New Nucleotidases in Escherichia coli , 2004, Journal of Biological Chemistry.

[30]  E V Koonin,et al.  Gene order is not conserved in bacterial evolution. , 1996, Trends in genetics : TIG.

[31]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[32]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.