Starts of bacterial genes: estimating the reliability of computer predictions.

Exact mapping of gene starts is an important problem in the computer-assisted functional analysis of newly sequenced prokaryotic genomes. We describe an algorithm for finding ribosomal binding sites without a learning sample. This algorithm is particularly useful for analysis of genomes with little or no experimentally mapped genes. There is a clear correlation between the ribosomal binding site (RBS) properties of a given genome and the potential gene start prediction accuracy. This correlation is of considerable predictive power and may be useful for estimating the expected success of future genome analysis efforts. We also demonstrate that the RBS properties depend on the phylogenetic position of a genome.

[1]  T Yada,et al.  Prediction of translation initiation sites on the genome of Synechocystis sp. strain PCC6803 by Hidden Markov model. , 1997, DNA research : an international journal for rapid publication of reports on genes and genomes.

[2]  J. Shine,et al.  Terminal-sequence analysis of bacterial ribosomal RNA. Correlation between the 3'-terminal-polypyrimidine sequence of 16-S RNA and translational specificity of the ribosome. , 1975, European journal of biochemistry.

[3]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[4]  T Yada,et al.  Analysis of sequence patterns surrounding the translation initiation sites on Cyanobacterium genome using the hidden Markov model. , 1997, DNA research : an international journal for rapid publication of reports on genes and genomes.

[5]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[6]  J. Battista,et al.  Against all odds: the survival strategies of Deinococcus radiodurans. , 1997, Annual review of microbiology.

[7]  G. Olsen,et al.  CRITICA: coding region identification tool invoking comparative analysis. , 1999, Molecular biology and evolution.

[8]  R. Planta Regulation of ribosome synthesis in yeast , 1997, Yeast.

[9]  O. Ohara,et al.  Sequence features surrounding the translation initiation sites assigned on the genome sequence of Synechocystis sp. strain PCC6803 by amino-terminal protein sequencing. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[10]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[11]  M. Borodovsky,et al.  Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[12]  G. Church,et al.  Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics , 1997, Journal of bacteriology.

[13]  亀山 春,et al.  Escherichia coli (K-12) のリン脂質に関する研究(第4報): E. coli (K-12) 無細胞液によるホスファチジン酸の生合成 , 1969 .

[14]  T. D. Schneider,et al.  Quantitative analysis of ribosome binding sites in E.coli. , 1994, Nucleic acids research.

[15]  J. van Duin,et al.  Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[16]  F. Robb,et al.  Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3. , 1998, DNA research : an international journal for rapid publication of reports on genes and genomes.

[17]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[18]  Mikhail S. Gelfand,et al.  Combining diverse evidence for gene recognition in completely sequenced bacterial genomes , 1998, German Conference on Bioinformatics.

[19]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[20]  Hans-Werner Mewes,et al.  The PIR-International Protein Sequence Database , 1992, Nucleic Acids Res..

[21]  R. Vellanoweth Translation and Its Regulation , 1993 .

[22]  Simon Kasif,et al.  Microbial gene identification using interpolated Markov , 1998 .

[23]  G. M. Studnicka Quantitative computer analysis of signal sequence homologies in DNA , 1986, Comput. Appl. Biosci..

[24]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[25]  W. C. Barker,et al.  The PIR-International Protein Sequence Database. , 1998, Nucleic acids research.

[26]  B L Maidak,et al.  The RDP-II (Ribosomal Database Project) , 2001, Nucleic Acids Res..

[27]  J Maizel,et al.  Identification of ribosome binding sites in Escherichia coli using neural network models. , 1995, Nucleic acids research.

[28]  P. H. Van Knippenberg,et al.  Secondary structure as primary determinant of the efficiency of ribosomal binding sites in Escherichia coli , 1986, Nucleic Acids Res..

[29]  M. Smit,et al.  Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. , 1990 .

[30]  G. Olsen,et al.  A phylogenetic analysis of Aquifex pyrophilus. , 1992, Systematic and applied microbiology.

[31]  H. Margalit,et al.  Identification and characterization of E.coli ribosomal binding sites by free energy computation. , 1993, Nucleic acids research.

[32]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[33]  M. Wösten,et al.  Identification of Campylobacter jejuniPromoter Sequences , 1998, Journal of bacteriology.

[34]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[35]  G. Heijne Analysis of the distribution of charged residues in the N‐terminal region of signal sequences: implications for protein export in prokaryotic and eukaryotic cells. , 1984, The EMBO journal.