Computer Methods to Locate Genes and Signals in Nucleic Acid Sequences

Computer methods are becoming increasingly important both during the determination of a DNA sequence and later in its subsequent analysis. This is because the sequencing methods are very rapid, easy to apply and hence generate a lot of data, and also because the rate of sequencing far outstrips the rate at which experiments can be done to elucidate the function of the sequences derived. Elucidation of the function of the sequence includes mapping messenger RNAs, promoters, splice junctions and other control regions. While a positive experimental result has the great advantage over computer analysis of giving firm evidence, computer methods are fast and cheap. The purpose of this article is to describe some of the computer techniques developed for locating these sequence features. I include methods to locate protein genes, tRNA genes, promoters, ribosome binding sites, splice junctions, terminator sequences and polyadenylation sites. I shall refer to sequences such as promoters and ribosome binding sites as “signal sequences”. We need to be able to scan through a sequence and to give some measure of the probability that each section of the sequence contains any of these features.

[1]  P. Dennis,et al.  Nucleotide sequence of the proximal portion of the RNA polymerase beta subunit gene of Escherichia coli. , 1980, Gene.

[2]  M. Kozak Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. , 1984, Nucleic acids research.

[3]  D J Lipman,et al.  Contextual constraints on synonymous codon choice. , 1983, Journal of molecular biology.

[4]  W Mandecki,et al.  A lac promoter with a changed distance between -10 and -35 regions. , 1982, Nucleic acids research.

[5]  K. Sugimoto,et al.  Sequence of promoter for coat protein gene of bacteriophage fd , 1976, Nature.

[6]  A Landy,et al.  Promoter mutations in the transfer RNA gene tyrT of Escherichia coli. , 1979, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R Nussinov,et al.  Doublet frequencies in evolutionary distinct groups. , 1984, Nucleic acids research.

[8]  T. Maniatis,et al.  The nucleotide sequence of the human β-globin gene , 1980, Cell.

[9]  D. K. Hawley,et al.  Compilation and analysis of Escherichia coli promoter DNA sequences. , 1983, Nucleic acids research.

[10]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[11]  P. Seeburg,et al.  Interaction of RNA polymerase with promoters from bacteriophage fd. , 1977, European journal of biochemistry.

[12]  Sydney Brenner,et al.  Molecular analysis of the unc-54 myosin heavy-chain gene of Caenorhabditis elegans , 1981, Nature.

[13]  C. Alff-Steinberger,et al.  Evidence for a coding pattern on the non-coding strand of the E. coli genome. , 1984, Nucleic acids research.

[14]  Scott F. Gilbert,et al.  Identification of initiation sites for the in vitro transcription of rRNA operons rrnE and rrnA in E. coli , 1979, Cell.

[15]  T. Ikemura Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. , 1982, Journal of molecular biology.

[16]  Rodger Staden,et al.  Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes , 1984, Nucleic Acids Res..

[17]  D. A. Clayton,et al.  Sequence and gene organization of mouse mitochondrial DNA , 1981, Cell.

[18]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. , 1981, Journal of molecular biology.

[19]  Stephen M. Mount,et al.  A catalogue of splice junction sequences. , 1982, Nucleic acids research.

[20]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[21]  R. Harr,et al.  Search algorithm for pattern match analysis of nucleic acid sequences. , 1983, Nucleic acids research.

[22]  F. Sanger,et al.  Sequence and organization of the human mitochondrial genome , 1981, Nature.

[23]  T. Horii,et al.  Organization of the recA gene of Escherichia coli. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Mathias Sprinzl,et al.  ERRATA Collection of published tRNA sequences , 1979 .

[25]  G. Duester,et al.  Nucleotide sequence of an Escherichia coli tRNA (Leu 1) operon and identification of the transcription promoter signal. , 1981, Nucleic acids research.

[26]  M. Gribskov,et al.  The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression , 1984, Nucleic Acids Res..

[27]  G. Stormo,et al.  Translational initiation in prokaryotes. , 1981, Annual review of microbiology.

[28]  C. Gray,et al.  Nucleotide sequence of an RNA polymerase binding site from the DNA of bacteriophage fd. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[29]  P. Chambon,et al.  The ovalbumin gene family: Structure of the X gene and evolution of duplicated split genes , 1980, Cell.

[30]  A. D. McLachlan,et al.  Codon preference and its use in identifying protein coding regions in long DNA sequences , 1982, Nucleic Acids Res..

[31]  Ovchinnikov IuA,et al.  [Primary structure of Escherichia coli RNA-polymerase. Nucleotide sequence of gene rpoB and amino acid sequence of the beta-subunit]. , 1980 .

[32]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[33]  F. Sanger,et al.  Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. , 1982, Journal of molecular biology.

[34]  P. Meisel Margaret O. Dayhoff: Atlas of Protein Sequence and Structure 1969 (Volume 4) XXIV u. 361 S., 21 Ausklapptafeln, 68 Abb. und zahlreiche Tabellen. National Biomedical Research Foundation, Silver Spring/Maryland 1969. Preis $ 12,50 , 1971 .

[35]  J. C. Shepherd Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[36]  W G Beattie,et al.  Complete nucleotide sequence of the chicken chromosomal ovalbumin gene and its biological significance. , 1981, Biochemistry.

[37]  N. Proudfoot,et al.  3′ Non-coding region sequences in eukaryotic messenger RNA , 1976, Nature.

[38]  J. Gralla,et al.  Spacer mutations in the lac ps promoter. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[39]  M J Shulman,et al.  The coding function of nucleotide sequences can be discerned by statistical analysis. , 1981, Journal of theoretical biology.

[40]  D. Sargan,et al.  A possible novel interaction between the 3′‐end of 18 S ribosomal RNA and the 5'‐leader sequence of many eukaryotic messenger RNAs , 1982, FEBS letters.

[41]  V. V. Gubanov,et al.  The primary structure of Escherichia coli RNA polymerase. Nucleotide sequence of the rpoB gene and amino-acid sequence of the beta-subunit. , 1981, European journal of biochemistry.

[42]  Rodger Staden,et al.  Graphic methods to determine the function of nucleic acid sequences , 1984, Nucleic Acids Res..

[43]  P. Dennis,et al.  Nucleotide sequence of the ribosomal protein gene cluster adjacent to the gene for RNA polymerase subunit beta in Escherichia coli. , 1979, Proceedings of the National Academy of Sciences of the United States of America.

[44]  A. Travers,et al.  Conserved features of coordinately regulated E. coli promoters. , 1984, Nucleic acids research.

[45]  D. Pribnow,et al.  Bacteriophage T7 early promoters: nucleotide sequences of two RNA polymerase binding sites. , 1975, Journal of molecular biology.

[46]  P. Youderian,et al.  Sequence determinants of promoter activity , 1982, Cell.

[47]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Scott F. Gilbert,et al.  DNA sequences of promoter regions for rRNA operons rrnE and rrnA in E. coli , 1979, Cell.

[49]  A. Sancar,et al.  Sequences of the recA gene and protein. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[50]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[51]  Ovchinnikov IuA,et al.  [Primary structure of Escherichia coli RNA-polymerase. The nucleotide sequence of the rpoC gene and the amino acid sequence of the beta'-subunit]. , 1981 .

[52]  T. Grundström,et al.  The E. coli β-lactamase attenuator mediates growth rate-dependent regulation , 1981, Nature.

[53]  R Staden,et al.  A computer program to search for tRNA genes. , 1980, Nucleic acids research.

[54]  R. Staden A new computer method for the storage and manipulation of DNA gel reading data. , 1980, Nucleic acids research.