Computer survey for likely genes in the one megabase contiguous genomic sequence data of Synechocystis sp. strain PCC6803.

Using the computer program GeneMark, the open reading frames (ORFs) previously assigned within the one megabase sequence data of the genome of the cyanobacterium, Synechocystis sp. strain PCC6803 (Kaneko et al., DNA Res. 2: 153-166, 1995), were re-examined. Matrices required by GeneMark for its statistical calculation were generated and modified by running a script termed GeneMark-Genesis that performed recursive application of GeneMark against the Synechocystis data and evaluated the probability scores for optimization. Based on the matrices thus generated, 752 of the 818 previously assigned ORFs (92%) were supported by GeneMark as likely coding sequences, of which 26 were predicted to start at more internal positions than previously assigned. In addition, 50 ORFs were newly identified as likely coding sequences, most of them being shorter than 300 bp. Thus, the procedure was proven to be very powerful to locate likely coding regions within the genomic sequence data of Synechocystis without having prior information concerning their similarity to the genes of other organisms. However, GeneMark did not predict 66 previously assigned ORFs as likely genes: 14 of them showed significant degrees of similarity to known genes and 10 others were found within IS-like elements. It seems that these genes, many of which appear to be exogenous origin, escaped detection by GeneMark as in the case of "class 3 (horizontally transferred) genes" of E. coli, which in turn suggests that genes of different phylogenetic origins might also be detected as such by modifying the matrices.

[1]  D. Biniszkiewicz,et al.  Self‐splicing group I intron in cyanobacterial initiator methionine tRNA: evidence for lateral transfer of introns in bacteria. , 1994, The EMBO journal.

[2]  K. Isono,et al.  Characteristic features of the nucleotide sequences of yeast mitochondrial ribosomal protein genes as analyzed by computer program GeneMark. , 1994, DNA research : an international journal for rapid publication of reports on genes and genomes.

[3]  M. Borodovsky,et al.  Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. , 1994, Nucleic acids research.

[4]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[5]  J. Fickett,et al.  Assessment of protein coding measures. , 1992, Nucleic acids research.

[6]  S. Tabata,et al.  A physical map of the genome of a unicellular cyanobacterium Synechocystis sp. strain PCC6803. , 1994, DNA research : an international journal for rapid publication of reports on genes and genomes.

[7]  E V Koonin,et al.  New genes in old sequence: a strategy for finding genes in the bacterial genome. , 1994, Trends in biochemical sciences.

[8]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[9]  N. Miyajima,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. , 1995, DNA research : an international journal for rapid publication of reports on genes and genomes.

[10]  S. Tabata,et al.  Assignment of 82 known genes and gene clusters on the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. , 1995, DNA research : an international journal for rapid publication of reports on genes and genomes.

[11]  A. Wada,et al.  Analysis of Escherichia coli ribosomal proteins by an improved two dimensional gel electrophoresis. I. Detection of four new proteins. , 1986, Journal of biochemistry.

[12]  H. An,et al.  ESI3, a Stress-Induced Gene from Lophopyrum elongatum , 1994, Plant physiology.

[13]  M. Borodovsky,et al.  Detection of new genes in a bacterial genome using Markov models for three gene classes. , 1995, Nucleic acids research.

[14]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.