Consistency of gene starts among Burkholderia genomes

BackgroundEvolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins.ResultsExisting Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides.ConclusionsOur findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps.

[1]  T. Meinnel,et al.  Tools for analyzing and predicting N‐terminal protein modifications , 2008, Proteomics.

[2]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[3]  H. Wiker,et al.  The Impact of Proteomic Advances on Bacterial Gene Annotation , 2009 .

[4]  R. Hegde,et al.  The surprising complexity of signal sequences. , 2006, Trends in biochemical sciences.

[5]  J. Ishikawa,et al.  FramePlot: a new implementation of the frame analysis for predicting protein-coding regions in bacterial DNA with a high G + C content. , 1999, FEMS microbiology letters.

[6]  Reinhard Wolf,et al.  Coding-Sequence Determinants of Gene Expression in Escherichia coli , 2009 .

[7]  E. Davis,et al.  Experimental determination of translational start sites resolves uncertainties in genomic open reading frame predictions – application to Mycobacterium tuberculosis , 2009, Microbiology.

[8]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[9]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[10]  M. Bibb,et al.  The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences. , 1984, Gene.

[11]  P. Bork,et al.  Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? , 2008, BMC Genomics.

[12]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[13]  K. Zeth,et al.  The bacterial N‐end rule pathway: expect the unexpected , 2010, Molecular microbiology.

[14]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .