Prediction of Function in DNA Sequence

ABSTRACT Recognition of function of newly sequenced DNA fragments is an important area of computational molecular biology. Here we present an extensive review of methods for prediction of functional sites, tRNA, and protein-coding genes and discuss possible further directions of research in this area. Key words: DNA sequence analysis; functional sites; genes; protein-coding regions; exons; introns; prediction; tRNA

[1]  Robert Entriken,et al.  Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity , 1984, Nucleic Acids Res..

[2]  A. Danchin,et al.  Evidence for horizontal gene transfer in Escherichia coli speciation. , 1991, Journal of molecular biology.

[3]  T D Schneider,et al.  Excess information at bacteriophage T7 genomic promoters detected by a random cloning technique. , 1989, Nucleic acids research.

[4]  M S Gelfand,et al.  Genetic language: metaphore or analogy? , 1993, Bio Systems.

[5]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[6]  A V Lukashin,et al.  Neural network models for promoter recognition. , 1989, Journal of biomolecular structure & dynamics.

[7]  W. H. Day,et al.  Critical comparison of consensus methods for molecular sequences. , 1992, Nucleic acids research.

[8]  Fred R. McMorris,et al.  A consensus program for molecular sequences , 1993, Comput. Appl. Biosci..

[9]  R. Harr,et al.  Search algorithm for pattern match analysis of nucleic acid sequences. , 1983, Nucleic acids research.

[10]  M Kanehisa,et al.  An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. , 1992, Nucleic acids research.

[11]  David J. States,et al.  QGB: Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification , 1994, J. Comput. Biol..

[12]  James M. Sikela,et al.  Single pass sequencing and physical and genetic mapping of human brain cDNAs , 1992, Nature Genetics.

[13]  J Xu,et al.  Coincident indices of exons and introns. , 1993, Computers in biology and medicine.

[14]  S Henikoff,et al.  Sequence analysis by electronic mail server. , 1993, Trends in biochemical sciences.

[15]  D C Shields,et al.  "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. , 1988, Molecular biology and evolution.

[16]  E N Trifonov,et al.  Terminators of transcription with RNA polymerase from Escherichia coli: what they look like and how to find them. , 1986, Journal of biomolecular structure & dynamics.

[17]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[18]  J. Fickett,et al.  Assessment of protein coding measures. , 1992, Nucleic acids research.

[19]  D Benton,et al.  Bioinformatics--principles and potential of a new multidisciplinary tool. , 1996, Trends in biotechnology.

[20]  T. Miyata,et al.  Secondary structure of MS2 phage RNA and bias in code word usage. , 1979, Nucleic acids research.

[21]  Danielle A. M. Konings Coexistence of Multiple Codes in Messenger RNA Molecules , 1992, Comput. Chem..

[22]  H. B. Nicholas,et al.  An algorithm for discriminating sequences and its application to yeast transfer RNA , 1987, Comput. Appl. Biosci..

[23]  T. Kunisawa,et al.  Synonymous codon preferences in bacteriophage T4: a distinctive use of transfer RNAs from T4 and from its host Escherichia coli. , 1992, Journal of theoretical biology.

[24]  Ross A. Overbeek,et al.  Searching for Genomic Organizational Motifs: Explorations of the Escherichia Coli Chromosome , 1993, Comput. Chem..

[25]  H. Margalit,et al.  Identification and characterization of E.coli ribosomal binding sites by free energy computation. , 1993, Nucleic acids research.

[26]  D. Ghosh,et al.  New developments of a transcription factors database. , 1991, Trends in biochemical sciences.

[27]  K. T. Turpaev,et al.  Specific nuclear DNA binding proteins. , 1990 .

[28]  T D Schneider,et al.  High information conservation implies that at least three proteins bind independently to F plasmid incD repeats , 1992, Journal of bacteriology.

[29]  O. Berg,et al.  Selection of DNA binding sites by regulatory proteins: the LexA protein and the arginine repressor use different strategies for functional specificity. , 1988, Nucleic acids research.

[30]  W. McClure,et al.  Searching for and predicting the activity of sites for DNA binding proteins: compilation and analysis of the binding sites for Escherichia coli integration host factor (IHF). , 1990, Nucleic acids research.

[31]  T. Kirkwood,et al.  Statistical Analysis of Deoxyribonucleic Acid Sequence Data-a Review , 1989 .

[32]  W. H. Day,et al.  Consensus sequences based on plurality rule. , 1992, Bulletin of mathematical biology.

[33]  J W Fickett,et al.  Estimation of protein coding density in a corpus of DNA sequence data. , 1993, Nucleic acids research.

[34]  M. Huynen,et al.  Multiple coding and the evolutionary properties of RNA secondary structure. , 1993, Journal of theoretical biology.

[35]  I N Day,et al.  Analysis of the 5'-AAUAAA motif and its flanking sequence in human RNA: relevance to cDNA library sorting. , 1992, Gene.

[36]  Y Iida,et al.  Nucleotide sequence analysis of human beta-globin gene by the quantification method: mutations in 3'-splice junction sequence and beta-thalassemia. , 1990, Journal of biochemistry.

[37]  E. Wingender,et al.  Transcription regulating proteins and their recognition sequences. , 1990, Critical reviews in eukaryotic gene expression.

[38]  Olivier Gascuel,et al.  Descriptions structurelles. Discrimination et apprentissage sur ces descriptions , 1985 .

[39]  Mark Borodovsky,et al.  Deriving Non-homogeneous DNA Markov Chain Models by Cluster Analysis Algorithm Minimizing Multiple Alignment Entropy , 1994, Comput. Chem..

[40]  M. Sugiura,et al.  Fine structural features of the chloroplast genome: comparison of the sequenced chloroplast genomes. , 1991, Nucleic acids research.

[41]  S Karlin,et al.  Contrasts in codon usage of latent versus productive genes of Epstein-Barr virus: data and hypotheses , 1990, Journal of virology.

[42]  M. C. Ganoza,et al.  Potential secondary structure at translation-initiation sites [published erratum appears in Nucleic Acids Res 1988 May 11;16(9): 4196] , 1987, Nucleic Acids Res..

[43]  S. Le,et al.  A highly conserved RNA folding region coincident with the Rev response element of primate immunodeficiency viruses. , 1990, Nucleic acids research.

[44]  P. Sharp,et al.  Regional base composition variation along yeast chromosome III: evolution of chromosome primary structure. , 1993, Nucleic acids research.

[45]  H Tachibana,et al.  Correlation between the rate of productive transcription initiation and the strand-melting property of Escherichia coli promoters. , 1985, Nucleic acids research.

[46]  H B Nicholas,et al.  Differences between transfer RNA molecules. , 1987, Journal of molecular biology.

[47]  Yu. A. Sprizhitsky,et al.  The SOS system of Escherichia coli in the regulation of bacteriophage λ development , 1983 .

[48]  C. Watanabe,et al.  Compilation and comparison of the sequence context around the AUG startcodons in Saccharomyces cerevisiae mRNAs. , 1987, Nucleic acids research.

[49]  J. Hawkins,et al.  A survey on intron and exon lengths. , 1988, Nucleic acids research.

[50]  Lawrence Hunter,et al.  Artificial Intelligence and Molecular Biology , 1992, AI Mag..

[51]  U Grob,et al.  SQUIRREL: Sequence QUery, Information Retrieval and REporting Library. A program package for analyzing signals in nucleic acid sequences for the VAX. , 1991, Nucleic acids research.

[52]  M. Muller,et al.  Application of a degenerate consensus sequence to quantify recognition sites by vertebrate DNA topoisomerase II , 1989, Journal of molecular recognition : JMR.

[53]  Michael Q. Zhang,et al.  A weight array method for splicing signal analysis , 1993, Comput. Appl. Biosci..

[54]  F. Blattner,et al.  DNA sequence and analysis of 136 kilobases of the Escherichia coli genome: organizational symmetry around the origin of replication. , 1993, Genomics.

[55]  A. Pavesi,et al.  Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. , 1994, Nucleic acids research.

[56]  Dan S. Prestridge,et al.  SIGNAL SCAN: a computer program that scans DNA sequences for eukaryotic transcriptional elements , 1991, Comput. Appl. Biosci..

[57]  Y Iida,et al.  Recognition patterns for exon-intron junctions in higher organisms as revealed by a computer search. , 1983, Journal of biochemistry.

[58]  E V Koonin,et al.  New genes in old sequence: a strategy for finding genes in the bacterial genome. , 1994, Trends in biochemical sciences.

[59]  M. Bulmer,et al.  Are codon usage patterns in unicellular organisms determined by selection‐mutation balance? , 1988 .

[60]  Jean-Michel Claverie,et al.  Sequence "Signals": Artifact or Reality? , 1992, Comput. Chem..

[61]  I V Volkova,et al.  [Participation of small RNAs, associated with poly(A)+RNA in cytoplasm in hormonal regulation of genome expression]. , 1989, Molekuliarnaia biologiia.

[62]  C Savakis,et al.  Contamination of cDNA sequences in databases. , 1993, Science.

[63]  Krzysztof R. Apt,et al.  Logic Programming , 1990, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[64]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[65]  J. R. Spitzner,et al.  Eukaryotic topoisomerase II preferentially cleaves alternating purine- pyrimidine repeats , 1990, Nucleic Acids Res..

[66]  Y Iida Splice-site signals of mRNA precursors as revealed by computer search. Site-specific mutagenesis and thalassemia. , 1985, Journal of biochemistry.

[67]  S. Knudsen,et al.  Cleaning up gene databases , 1990, Nature.

[68]  Anna Tramontano,et al.  Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics , 1986, Nucleic Acids Res..

[69]  Y. Ohshima,et al.  Signals for the selection of a splice site in pre-mRNA. Computer analysis of splice junction sequences and like sequences. , 1987, Journal of molecular biology.

[70]  C. Fields,et al.  Information content of Caenorhabditis elegans splice site sequences varies with intron length. , 1990, Nucleic acids research.

[71]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. , 1988, Trends in biochemical sciences.

[72]  M S Gelfand,et al.  Computer prediction of the exon-intron structure of mammalian pre-mRNAs. , 1990, Nucleic acids research.

[73]  M. Bulmer,et al.  Coevolution of codon usage and transfer RNA abundance , 1987, Nature.

[74]  Rodger Staden,et al.  Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes , 1984, Nucleic Acids Res..

[75]  R Nussinov,et al.  Large helical conformational deviations from ideal B-DNA and prokaryotic regulatory sites. , 1985, Journal of theoretical biology.

[76]  T Mikkelsen Interpreting sequence motifs: a cautionary note. , 1993, Trends in genetics : TIG.

[77]  R. KNÜPPEL,et al.  TRANSFAC Retrieval Program: A Network Model Database of Eukaryotic Transcription Regulating Sequences and Proteins , 1994, J. Comput. Biol..

[78]  T. D. Schneider,et al.  Quantitative analysis of the relationship between nucleotide sequence and functional activity. , 1986, Nucleic acids research.

[79]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. , 1981, Journal of molecular biology.

[80]  Kousaku Okubo,et al.  Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression , 1992, Nature Genetics.

[81]  M. O'Neill Escherichia coli promoters. I. Consensus as it relates to spacing class, specificity, repeat substructure, and three-dimensional organization. , 1989, The Journal of biological chemistry.

[82]  Y. Lida,et al.  Analysis of context of 5'-splice site sequences in mammalian mRNA precursors by subclass method , 1992, Comput. Appl. Biosci..

[83]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[84]  A. D. McLachlan,et al.  A method for measuring the non-random bias of a codon usage table. , 1984, Nucleic acids research.

[85]  S. Knudsen,et al.  Neural network detects errors in the assignment of mRNA splice sites. , 1990, Nucleic acids research.

[86]  P. Sharp,et al.  Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons. , 1986, Nucleic acids research.

[87]  J. Bennetzen,et al.  Codon selection in yeast. , 1982, The Journal of biological chemistry.

[88]  Shu-Yun Le,et al.  Studies of local stability in histone, U-snRNA and globin precursor mRNAs around transcription termination sites , 1990 .

[89]  J N Anderson,et al.  Conserved DNA structures in origins of replication. , 1990, Nucleic acids research.

[90]  R Staden,et al.  A computer program to search for tRNA genes. , 1980, Nucleic acids research.

[91]  Patrizio Arrigo,et al.  Potentially functional regions of nucleic acids recognized by a Kohonen's self-organizing map , 1993, Comput. Appl. Biosci..

[92]  R. Staden Finding protein coding regions in genomic sequences. , 1990, Methods in enzymology.

[93]  E N Trifonov,et al.  The multiple codes of nucleotide sequences. , 1989, Bulletin of mathematical biology.

[94]  S. Aota,et al.  Diversity in G + C content at the third position of codons in vertebrate genes and its cause. , 1986, Nucleic acids research.

[95]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[96]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[97]  T. Ikemura Codon usage and tRNA content in unicellular and multicellular organisms. , 1985, Molecular biology and evolution.

[98]  F. Lisacek,et al.  Automatic identification of group I intron cores in genomic DNA sequences. , 1994, Journal of molecular biology.

[99]  Arcady R. Mushegian,et al.  Sequencing and analysis of bacterial genomes , 1996, Current Biology.

[100]  M. Gribskov,et al.  The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression , 1984, Nucleic Acids Res..

[101]  P. Sharp,et al.  Codon usage and gene expression level in Dictyostelium discoideum: highly expressed genes do 'prefer' optimal codons. , 1989, Nucleic acids research.

[102]  N. Ogasawara,et al.  Markedly unbiased codon usage in Bacillus subtilis. , 1985, Gene.

[103]  E. Brody,et al.  Prediction of rho-independent Escherichia coli transcription terminators. A statistical analysis of their RNA stem-loop structures. , 1990 .

[104]  A. Brennicke,et al.  On the identification of group II introns in nucleotide sequence data. , 1994, Journal of molecular biology.

[105]  M. Borodovsky,et al.  Recognition of genes in DNA sequence with ambiguities. , 1993, Bio Systems.

[106]  Y Iida,et al.  Quantification analysis of 5'-splice signal sequences in mRNA precursors. Mutations in rabbit beta-globin gene. , 1989, Biochimica et biophysica acta.

[107]  Michael C. O'Neill,et al.  Escherichia coli promoters: neural networks develop distinct descriptions in learning to search for promoters of different spacing classes , 1992, Nucleic Acids Res..

[108]  J Sallantin,et al.  Search for promoter sites of prokaryotic DNA using learning techniques. , 1985, Biochimie.

[109]  L. J. Korn,et al.  Computer analysis of nucleic acid regulatory sequences. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[110]  H. B. Nicholas,et al.  A statistical method for correlating tRNA sequence with amino acid specificity , 1986, Nucleic Acids Res..

[111]  W. Makałowski,et al.  Conserved signals in the 5' flanking region of eukaryotic nuclear tRNA genes. , 1992, DNA sequence : the journal of DNA sequencing and mapping.

[112]  Marvin B. Shapiro,et al.  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. , 1987, Nucleic acids research.

[113]  C. Burks,et al.  Identifying potential tRNA genes in genomic DNA sequences. , 1991, Journal of molecular biology.

[114]  T. Russo,et al.  A microcomputer program for the identification of tRNA genes , 1985, Comput. Appl. Biosci..

[115]  M. O'Neill,et al.  Escherichia coli promoters. II. A spacing class-dependent promoter search protocol. , 1989, The Journal of biological chemistry.

[116]  James W. Fickett,et al.  Inferring Genes From Open Reading Frames , 1994, Comput. Chem..

[117]  J. Collado-Vides,et al.  The elements for a classification of units of genetic information with a combinatorial component. , 1993, Journal of theoretical biology.

[118]  Turpaev Kt,et al.  Nuclear protein factors binding with specific DNA sequences , 1990 .

[119]  E. G. Shpaer Constraints on codon context in Escherichia coli genes. Their possible role in modulating the efficiency of translation. , 1986, Journal of molecular biology.

[120]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[121]  S. Dhawale,et al.  Compilation of sequence-specific DNA-binding proteins implicated in transcriptional control in fungi. , 1993, Nucleic acids research.

[122]  Ulrich Grob,et al.  Recognition of ill-defined signals in nucleic acid sequences , 1988, Comput. Appl. Biosci..

[123]  Lydie Bougueleret,et al.  Dense Alu clustering and a potential new member of the NFκB family within a 90 kilobase HLA Class III segment , 1993, Nature Genetics.

[124]  J. Collado-Vides,et al.  Grammatical model of the regulation of gene expression. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[125]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[126]  E. Trifonov Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. , 1987, Journal of molecular biology.

[127]  K Frech,et al.  Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids. , 1993, Nucleic acids research.

[128]  C. Sander,et al.  Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III , 1992, Protein science : a publication of the Protein Society.

[129]  R Nussinov,et al.  Enhancer elements share local homologous twist-angle variations with a helical periodicity. , 1984, Biochimica et biophysica acta.

[130]  J W Fickett,et al.  Finding genes by computer: the state of the art. , 1996, Trends in genetics : TIG.

[131]  R. Blake,et al.  Delineation of coding areas in DNA sequences through assignment of codon probabilities. , 1985, Journal of biomolecular structure & dynamics.

[132]  D C Shields,et al.  Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity. , 1988, Nucleic acids research.

[133]  J Sallantin,et al.  Localization of the initiation of translation in messenger RNAs of prokaryotes by learning techniques. , 1985, Biochimie.

[134]  Edward N. Trifonov,et al.  Codes of nucleotide sequences , 1988 .

[135]  E N Trifonov,et al.  A computer algorithm for testing potential prokaryotic terminators. , 1984, Nucleic acids research.

[136]  Paul M. Sharp,et al.  Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes , 1986, Nucleic Acids Res..

[137]  E N Trifonov,et al.  Sequence-dependent variations of B-DNA structure and protein-DNA recognition. , 1983, Cold Spring Harbor symposia on quantitative biology.

[138]  G. Zhou,et al.  Neural network optimization for E. coli promoter prediction. , 1991, Nucleic acids research.

[139]  L. Duret,et al.  Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. , 1993, Nucleic acids research.

[140]  D C Shields,et al.  Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. , 1987, Nucleic acids research.

[141]  A K Konopka,et al.  Complexity charts can be used to map functional domains in DNA. , 1990, Genetic analysis, techniques and applications.

[142]  J. Claverie,et al.  Structure of the ecdysone-inducible P1 gene of Drosophila melanogaster. , 1990, Journal of molecular biology.

[143]  G. M. Suboch,et al.  Analysis of nonuniformity in intron phase distribution. , 1992, Nucleic acids research.

[144]  G. Studnicka,et al.  Nucleotide sequence homologies in control regions of prokaryotic genomes. , 1987, Gene.

[145]  C C Shen,et al.  Specificity and flexibility of the recognition of DNA helical structure by eukaryotic topoisomerase I. , 1990, Journal of molecular biology.

[146]  P. Pevzner Multiple alignment, communication cost, and graph matching , 1992 .

[147]  G. Bernardi,et al.  The vertebrate genome: isochores and evolution. , 1993, Molecular biology and evolution.

[148]  P Argos,et al.  Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences , 1988, Proteins.

[149]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[150]  David Ghosh,et al.  Status of the transcription factors database (TFD) , 1993, Nucleic Acids Res..

[151]  T. D. Schneider,et al.  Quantitative analysis of ribosome binding sites in E.coli. , 1994, Nucleic acids research.

[152]  J. Craig Venter,et al.  3,400 new expressed sequence tags identify diversity of transcripts in human brain , 1993, Nature Genetics.

[153]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[154]  N. Harris,et al.  Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis. , 1990, Nucleic acids research.

[155]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[156]  J. Collado-Vides,et al.  A linguistic representation of the regulation of transcription initiation. I. An ordered array of complex symbols with distinctive features. , 1993, Bio Systems.

[157]  J. Weissenbach,et al.  The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules , 1991, Cell.

[158]  P E Gibbs,et al.  Computer analysis of 1,25-dihydroxyvitamin D3-receptor regulated promoters: identification of a candidate D3-response element. , 1989, Biochemical and biophysical research communications.

[159]  M Bulmer Codon usage and secondary structure of MS2 phage RNA. , 1989, Nucleic acids research.

[160]  R. Staden,et al.  The C. elegans genome sequencing project: a beginning , 1992, Nature.

[161]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.

[162]  G. M. Studnicka Quantitative computer analysis of signal sequence homologies in DNA , 1986, Comput. Appl. Biosci..

[163]  Mikhail S. Gelfand,et al.  Recognition of Genes in Human DNA Sequences , 1996, J. Comput. Biol..

[164]  Yôichi Iida,et al.  Quantification analysis of 5'-splice signal sequences in mRNA precursors. Mutations in 5'-splice signal sequence of human β-globin gene and β-thalassemia , 1990 .

[165]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[166]  E. Wingender,et al.  Compilation of transcription regulating proteins. , 1988, Nucleic acids research.

[167]  G. Bernardi,et al.  The isochore organization of the human genome. , 1989, Annual review of genetics.

[168]  J. C. Shepherd Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[169]  Silke Meyer,et al.  Compilation of vertebrate-encoded transcription factors , 1992, Nucleic Acids Res..

[170]  M. Muller,et al.  A predictive model for DNA recognition by the herpes simplex virus protein ICP4. , 1991, Journal of molecular biology.

[171]  M. Gouy,et al.  Codon usage in bacteria: correlation with gene expressivity. , 1982, Nucleic acids research.

[172]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[173]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. , 1981, Journal of molecular biology.

[174]  M. Adams,et al.  Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues , 1992, Nature Genetics.

[175]  R. Durbin,et al.  2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans , 1994, Nature.

[176]  Y Iida,et al.  Categorical discriminant analysis of 3'-splice site signals of mRNA precursors in higher eukaryote genes. , 1988, Journal of theoretical biology.

[177]  Mark E. Dalphin,et al.  The translational termination signal database , 1993, Nucleic Acids Res..

[178]  T. Ikemura Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. , 1982, Journal of molecular biology.

[179]  R Nussinov Promoter helical structure variation at the Escherichia coli polymerase interaction sites. , 1984, The Journal of biological chemistry.

[180]  J. Locker,et al.  A dictionary of transcription control sequences. , 1990, DNA sequence : the journal of DNA sequencing and mapping.

[181]  T. D. Schneider,et al.  Characterization of Translational Initiation Sites in E. Coui , 1982 .

[182]  Desmond G. Higgins,et al.  GCWIND: a microcomputer program for identifying open reading frames according to codon positional G+C content , 1992, Comput. Appl. Biosci..

[183]  Mironov Aa,et al.  Computer programs for the analysis of nucleotide sequences (MALK) , 1987 .

[184]  R Nussinov,et al.  Helix stability in prokaryotic promoter regions. , 1988, Biochemistry.

[185]  W. H. Day,et al.  Threshold consensus methods for molecular sequences. , 1992, Journal of theoretical biology.

[186]  H Almagor Nucleotide distribution and the recognition of coding regions in DNA sequences: an information theory approach. , 1985, Journal of theoretical biology.

[187]  F. Blattner,et al.  Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. , 1992, Science.

[188]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. II. The binding specificity of cyclic AMP receptor protein to recognition sites. , 1988, Journal of molecular biology.

[189]  R. Wartell,et al.  Sequence distributions associated with DNA curvature are found upstream of strong E. coli promoters. , 1987, Nucleic acids research.

[190]  V. Cuomo,et al.  An application of maximum entropy techniques to determine homogeneous sets of nucleotidic sequences. , 1992, Journal of theoretical biology.

[191]  David R. Wolf,et al.  Base compositional structure of genomes. , 1992, Genomics.

[192]  N L Harris,et al.  Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. , 1990, Methods in enzymology.

[193]  F Rozkot,et al.  A novel method for promoter search enhanced by function-specific subgrouping of promoters--developed and tested on E.coli system. , 1989, Nucleic acids research.

[194]  T. D. Schneider,et al.  Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites. , 1992, Journal of molecular biology.

[195]  Chris A. Fields,et al.  gm: a practical tool for automating DNA sequence analysis , 1990, Comput. Appl. Biosci..

[196]  Chris Sander,et al.  What's in a genome? , 1992, Nature.

[197]  M J Sternberg,et al.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. , 1992, Biochemistry.

[198]  Francis Rodier,et al.  Key for protein coding sequences identification: computer analysis of codon strategy , 1982, Nucleic Acids Res..

[199]  Rodger Staden,et al.  Methods for discovering novel motifs in nucleic acid sequences , 1989, Comput. Appl. Biosci..

[200]  R. Staden A new computer method for the storage and manipulation of DNA gel reading data. , 1980, Nucleic acids research.

[201]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[202]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[203]  B. Dujon,et al.  The complete DNA sequence of yeast chromosome III , 1992, Nature.

[204]  V. Brendel,et al.  Mapping of transcription terminators of bacteriophages phi X174 and G4 by sequence analysis , 1985, Journal of virology.

[205]  Fritz Eckstein,et al.  Nucleic acids and molecular biology , 1987 .

[206]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[207]  M J Shulman,et al.  The coding function of nucleotide sequences can be discerned by statistical analysis. , 1981, Journal of theoretical biology.

[208]  O. White,et al.  A quality control algorithm for DNA sequencing projects. , 1993, Nucleic acids research.

[209]  R. Nussinov,et al.  Periodic structurally similar oligomers are found on one side of the axes of symmetry in the lac, trp, and gal operators. , 1984, Journal of biomolecular structure & dynamics.

[210]  G. Hertz,et al.  DNA sequences at immunoglobulin switch region recombination sites. , 1993, Nucleic acids research.

[211]  W. McClain,et al.  Rules that govern tRNA identity in protein synthesis. , 1993, Journal of molecular biology.

[212]  C. Sander,et al.  Yeast chromosome III: new gene functions. , 1994, The EMBO journal.

[213]  D. Searls,et al.  Gene structure prediction by linguistic methods. , 1994, Genomics.

[214]  N N Alexandrov,et al.  Application of a new method of pattern recognition in DNA sequence analysis: a study of E. coli promoters. , 1990, Nucleic acids research.

[215]  C J Michel,et al.  New statistical approach to discriminate between protein coding and non-coding regions in DNA sequences and its evaluation. , 1986, Journal of theoretical biology.

[216]  H. Prydz,et al.  Evaluation of the exon predictions of the GRAIL software. , 1994, Genomics.

[217]  G Bernardi,et al.  A universal compositional correlation among codon positions. , 1992, Gene.

[218]  Jean-Michel Claverie,et al.  Heuristic informational analysis of sequences , 1986, Nucleic Acids Res..

[219]  C. C. Marvel A program for the identification of tRNA-like structures in DNA sequence data , 1986, Nucleic Acids Res..

[220]  M S Gelfand,et al.  Statistical analysis of mammalian pre-mRNA splicing sites. , 1989, Nucleic acids research.

[221]  N. Halloran,et al.  A survey of expressed genes in Caenorhabditis elegans , 1992, Nature Genetics.

[222]  G M Studnicka Escherichia coli promoter -10 and -35 region homologies correlate with binding and isomerization kinetics. , 1988, The Biochemical journal.

[223]  Jacob V. Maizel,et al.  Discriminant analysis of promoter regions in Escherichia coli sequences , 1988, Comput. Appl. Biosci..

[224]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[225]  Michael R. Hayden,et al.  The prediction of exons through an analysis of spliceable open reading frames , 1992, Nucleic Acids Res..

[226]  M. O'Neill,et al.  Training back-propagation neural networks to define and detect DNA-binding sites. , 1991, Nucleic acids research.

[227]  M. Gelfand,et al.  Prediction of the exon-intron structure by a dynamic programming approach. , 1993, Bio Systems.

[228]  C Cosmi,et al.  Characterization of nucleotidic sequences using maximum entropy techniques. , 1990, Journal of theoretical biology.

[229]  K W Kohn,et al.  Induction of cleavage in topoisomerase I c-DNA by topoisomerase I enzymes from calf thymus and wheat germ in the presence and absence of camptothecin. , 1993, Nucleic acids research.

[230]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[231]  J. Quinqueton,et al.  Application of learning techniques to splicing site recognition. , 1985, Biochimie.

[232]  Daniel Gautheret,et al.  An RNA pattern matching program with enhanced performance and portability , 1994, Comput. Appl. Biosci..

[233]  J. Collado-Vides,et al.  A linguistic representation of the regulation of transcription initiation. II. Distinctive features of sigma 70 promoters and their regulatory binding sites. , 1993, Bio Systems.

[234]  T. D. Schneider,et al.  Information analysis of sequences that bind the replication initiator RepA. , 1993, Journal of molecular biology.

[235]  Cathy H. Wu Classification Neural Networks for Rapid Sequence Annotation and Automated Database Organization , 1993, Comput. Chem..

[236]  M. Bibb,et al.  The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences. , 1984, Gene.

[237]  G S Mani Long-range doublet correlations in DNA and the coding regions. , 1992, Journal of theoretical biology.

[238]  David B. Searls Representing Genetic Information with Formal Grammars , 1988, AAAI.

[239]  Edward N. Trifonov Gene Splicing: Spatial Separation of Overlapping Messages , 1993, Comput. Chem..

[240]  M. O'Neill,et al.  Consensus methods for finding and ranking DNA binding sites. Application to Escherichia coli promoters. , 1989, Journal of molecular biology.

[241]  Martin E. Mulligan,et al.  Analysis of the occurrence of promoter-sites in DNA , 1986, Nucleic Acids Res..

[242]  D. K. Hawley,et al.  Compilation and analysis of Escherichia coli promoter DNA sequences. , 1983, Nucleic acids research.

[243]  G. Stormo Computer methods for analyzing sequence recognition of nucleic acids. , 1988, Annual Review of Biophysics and Biophysical Chemistry.

[244]  C J Watson Interpreting sequence motifs. , 1993, Trends in genetics : TIG.

[245]  D. Ghosh,et al.  A relational database of transcription factors. , 1990, Nucleic acids research.

[246]  J. Collado-Vides,et al.  A transformational-grammar approach to the study of the regulation of gene expression. , 1989, Journal of theoretical biology.

[247]  S C Harvey,et al.  A common structural feature in promoter sequences of E. coli. , 1987, Nucleic acids research.