Analysis and Predictions from Escherichia coli Sequences , or E . coli In Silico

Bacteria have been studied as living entities (in vivo) for 150 years and in cell-free systems (in vitro) for almost half that time. With the availability of DNA sequence information, it has become evident that genes can also be studied as lines of text. A new domain of knowledge and research concerned with this means of studying living systems has arisen; it has been referred to as informatics but also involves standard aspects of mathematics and statistics as well. Beside using computer programs that mechanically perform standard but tedious analysis of the information content of DNA, investigators are now using programs that help generate new knowledge about this information. Thus, in addition to the study of bacteria in vivo and in vitro, there is now an active endeavor studying them “in silico.” Escherichia coli has been a paradigm for such studies. It is anticipated that the E. coli genome sequence will be known by the end of 1997. In parallel, a vast amount of data on a variety of organisms has been collected, and it has become an important task not only to handle this huge quantity of information but also to extract from it the features that pertain to the concrete expression of life in general and to E. coli in particular. For the E. coli geneticist, no literature reviewing this new aspect of research exists; information is scattered through a vast number of journals and papers, often presenting independent but redundant approaches. Here we have summarized the less self-evident aspects of the data presented in the literature. Readers interested in features relevant specifically to informatics can find in the databank SEQANALREF an updated bibliography on software dealing with sequence analysis (SEQANALREF, present in the EMBL data library package, contained 3,076 references in release 64 [October 1995]). Major DNA and protein data banks are accessible on the Internet. The appropriate addresses can be found in references 23, 42, and 71. We shall follow the path that molecular geneticists pursue when they use formal techniques for investigating the significance of genes at the DNA level or of proteins at the polypeptide chain level: acquisition of sequences, analysis of these data, and management of DNA, RNA, and protein sequence data.

[1]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[2]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[3]  Satosi Watanabe,et al.  Knowing and guessing , 1969 .

[4]  J. Shine,et al.  The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M. Hill Correspondence Analysis: A Neglected Multivariate Method , 1974 .

[6]  D. Pribnow Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Bambara,et al.  On the statistical significance of primary structural features found in DNA-protein interaction sites. , 1975, Nucleic acids research.

[8]  Tom Maniatis,et al.  Recognition sequences of repressor and polymerase in the operators of bacteriophage lambda , 1975, Cell.

[9]  A. Danchin The specification of the immune response: a general selective model. , 1979, Molecular immunology.

[10]  Andrew Odlyzko,et al.  Long repetitive patterns in random sequences , 1980 .

[11]  Walter Gilbert,et al.  E. coli RNA polymerase interacts homologously with two different promoters , 1980, Cell.

[12]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. , 1981, Journal of molecular biology.

[13]  T. Ikemura Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. , 1981, Journal of molecular biology.

[14]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[15]  Manolo Gouy,et al.  Codon catalog usage is a genome strategy modulated for gene expressivity , 1981, Nucleic Acids Res..

[16]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[17]  M. Gouy,et al.  Codon usage in bacteria: correlation with gene expressivity. , 1982, Nucleic acids research.

[18]  R. Staden Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. , 1982, Nucleic acids research.

[19]  T. D. Schneider,et al.  Characterization of Translational Initiation Sites in E. Coui , 1982 .

[20]  M Ikehara,et al.  Essential structure of E. coli promoter: effect of spacer length between the two consensus sequences on promoter function. , 1983, Nucleic acids research.

[21]  D. K. Hawley,et al.  Compilation and analysis of Escherichia coli promoter DNA sequences. , 1983, Nucleic acids research.

[22]  H. Ochman,et al.  Standard reference strains of Escherichia coli from natural populations , 1984, Journal of bacteriology.

[23]  M. Ehrenberg,et al.  Costs of accuracy determined by a maximal growth rate constraint , 1984, Quarterly Reviews of Biophysics.

[24]  J. Devereux,et al.  A comprehensive set of sequence analysis programs for the VAX , 1984, Nucleic Acids Res..

[25]  A. Morineau,et al.  Multivariate descriptive statistical analysis , 1984 .

[26]  C. Alff-Steinberger,et al.  Evidence for a coding pattern on the non-coding strand of the E. coli genome. , 1984, Nucleic acids research.

[27]  M. Gribskov,et al.  The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression , 1984, Nucleic Acids Res..

[28]  Rodger Staden,et al.  A computer program to enter DNA gel reading data into a computer , 1984, Nucleic Acids Res..

[29]  R. Blake,et al.  Analysis of the codon bias in E. coli sequences. , 1984, Journal of biomolecular structure & dynamics.

[30]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[31]  Hans Söderlund,et al.  SEQAID: a DNA sequence assembling program based on a mathematical model , 1984, Nucleic Acids Res..

[32]  W. John Wilbur,et al.  On the statistical significance of nucleic acid similarities , 1984, Nucleic Acids Res..

[33]  J Sallantin,et al.  Localization of the initiation of translation in messenger RNAs of prokaryotes by learning techniques. , 1985, Biochimie.

[34]  W. McClure,et al.  Mechanism and control of transcription initiation in prokaryotes. , 1985, Annual review of biochemistry.

[35]  A. Hénaut,et al.  The origins of the strategy of codon use. , 1985, Biochimie.

[36]  P. Vigier,et al.  Etude des contraintes qui s'exercent sur la succession des bases dans un polynucléotide. I: La signification de la dégénérescence du code , 1985 .

[37]  [Structural descriptions. Discrimination and learning of these descriptions]. , 1985, Biochimie.

[38]  David Haussler,et al.  The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..

[39]  Marcella Attimonelli,et al.  ACNUC - a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage , 1985, Comput. Appl. Biosci..

[40]  J Sallantin,et al.  Search for promoter sites of prokaryotic DNA using learning techniques. , 1985, Biochimie.

[41]  H Soldano,et al.  Statistico-syntactic learning techniques. , 1985, Biochimie.

[42]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[43]  P. Sharp,et al.  Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons. , 1986, Nucleic acids research.

[44]  Etude des contraintes qui s'exercent sur la succession des bases dans un polynucléotide. II: La distribution des tétranucléotides complémentaires dans les gènes d'Escherichia coli et des bactériophages lambda et T7. , 1986 .

[45]  E N Trifonov,et al.  Terminators of transcription with RNA polymerase from Escherichia coli: what they look like and how to find them. , 1986, Journal of biomolecular structure & dynamics.

[46]  T Platt,et al.  Transcription termination and the regulation of gene expression. , 1986, Annual review of biochemistry.

[47]  W. Fiers,et al.  Inefficient translation initiation causes premature transcription termination in the IacZ gene , 1986, Cell.

[48]  Volker Brendel,et al.  Gnomic : a dictionary of genetic codes , 1986 .

[49]  [Statistical characteristics in primary structures of functional regions of Escherichia coli genome. II. Non-stationary Markov chains]. , 1986, Molekuliarnaia biologiia.

[50]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[51]  Christian Gautier,et al.  Statistical method for predicting protein coding regions in nucleic acid sequences , 1987, Comput. Appl. Biosci..

[52]  R. Ivarie,et al.  The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over- and underrepresented sequences by Markov chain analysis. , 1987, Nucleic acids research.

[53]  M. Nelson,et al.  Restriction endonucleases for pulsed field mapping of bacterial genomes. , 1987, Nucleic acids research.

[54]  S. Harrison,et al.  Effect of non-contacted bases on the affinity of 434 operator for 434 repressor and Cro , 1987, Nature.

[55]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[56]  K. Isono,et al.  The physical map of the whole E. coli chromosome: Application of a new strategy for rapid analysis and sorting of a large genomic library , 1987, Cell.

[57]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[58]  C. Harley,et al.  Analysis of E. coli promoter sequences. , 1987, Nucleic acids research.

[59]  M. Dreyfus,et al.  What constitutes the signal for the initiation of protein synthesis on Escherichia coli mRNAs? , 1988, Journal of molecular biology.

[60]  F Pfeiffer,et al.  VecBase, a cloning vector sequence data base. , 1988, Protein sequences & data analysis.

[61]  George B. Petersen,et al.  Messenger RNA recognition in Escherichia coli: a possible second site of interaction with 16S ribosomal RNA. , 1988, The EMBO journal.

[62]  Jacob V. Maizel,et al.  Discriminant analysis of promoter regions in Escherichia coli sequences , 1988, Comput. Appl. Biosci..

[63]  T. Cech,et al.  Conserved sequences and structures of group I introns: building an active site for RNA catalysis--a review. , 1988, Gene.

[64]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[65]  David B. Searls Representing Genetic Information with Formal Grammars , 1988, AAAI.

[66]  Alain Hénaut,et al.  Merging of distance matrices and classification by dynamic clustering , 1988, Comput. Appl. Biosci..

[67]  G. Cameron,et al.  The EMBL data library. , 1988, Nucleic acids research.

[68]  Alain Hénaut,et al.  Distance matrix comparison and tree construction , 1988, Pattern Recognit. Lett..

[69]  M. Belfort,et al.  Structural conservation among three homologous introns of bacteriophage T4 and the group I introns of eukaryotes. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[70]  M. O'Neill,et al.  Consensus methods for finding and ranking DNA binding sites. Application to Escherichia coli promoters. , 1989, Journal of molecular biology.

[71]  A V Lukashin,et al.  Neural network models for promoter recognition. , 1989, Journal of biomolecular structure & dynamics.

[72]  David B. Searls Investigating the Linguistics of DNA with Definite Clause Grammars , 1989, NACLP.

[73]  P. Pevzner,et al.  Linguistics of nucleotide sequences. I: The significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words. , 1989, Journal of biomolecular structure & dynamics.

[74]  M. O'Neill,et al.  Escherichia coli promoters. II. A spacing class-dependent promoter search protocol. , 1989, The Journal of biological chemistry.

[75]  T A Thanaraj,et al.  An additional ribosome-binding site on mRNA of highly expressed genes and a bifunctional site on the colicin fragment of 16S rRNA from Escherichia coli: important determinants of the efficiency of translation-initiation. , 1989, Nucleic acids research.

[76]  A. Wada,et al.  Novel third-letter bias in Escherichia coli codons revealed by rigorous treatment of coding constraints. , 1989, Journal of molecular biology.

[77]  M. O'Neill Escherichia coli promoters. I. Consensus as it relates to spacing class, specificity, repeat substructure, and three-dimensional organization. , 1989, The Journal of biological chemistry.

[78]  J. Palmer,et al.  An ancient group I intron shared by eubacteria and chloroplasts , 1990, Science.

[79]  W Miller,et al.  Alignment of Escherichia coli K12 DNA sequences to a genomic restriction map. , 1990, Nucleic acids research.

[80]  V. Emilsson,et al.  Growth rate dependence of transfer RNA abundance in Escherichia coli. , 1990, The EMBO journal.

[81]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[82]  M. L. Sprengart,et al.  The initiation of translation in E. coli: apparent base pairing between the 16srRNA and downstream sequences of the mRNA. , 1990, Nucleic acids research.

[83]  E. Westhof,et al.  Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. , 1990, Journal of molecular biology.

[84]  J. van Duin,et al.  Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[85]  Ming-Qun Xu,et al.  Bacterial origin of a chloroplast intron: conserved self-splicing group I introns in cyanobacteria , 1990, Science.

[86]  B P Gaber,et al.  NRL-3D: a sequence-structure database derived from the protein data bank (PDB) and searchable within the PIR environment. , 1990, Protein sequences & data analysis.

[87]  E. Brody,et al.  Prediction of rho-independent Escherichia coli transcription terminators. A statistical analysis of their RNA stem-loop structures. , 1990 .

[88]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[89]  A. Danchin,et al.  Mapping of sequenced genes (700 kbp) in the restriction map of the Escherichia coli chromosome , 1990, Molecular microbiology.

[90]  Escherichia coli K12 genomic database. , 1990, Protein sequences & data analysis.

[91]  A. Danchin,et al.  Evidence for horizontal gene transfer in Escherichia coli speciation. , 1991, Journal of molecular biology.

[92]  Rodger Staden,et al.  An X windows and UNIX implementation of our sequence analysis package , 1991, Comput. Appl. Biosci..

[93]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[94]  W Miller,et al.  Mapping sequenced E.coli genes by computer: software, strategies and examples. , 1991, Nucleic acids research.

[95]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[96]  Structure of two retrons of Escherichia coli and their common chromosomal insertion site , 1991, Molecular microbiology.

[97]  G D Stormo,et al.  Probing information content of DNA-binding sites. , 1991, Methods in enzymology.

[98]  L. Bossi,et al.  Common sequence determinants of the response of a prokaryotic promoter to DNA bending and supercoiling. , 1991, The EMBO journal.

[99]  C. Burks,et al.  Identifying potential tRNA genes in genomic DNA sequences. , 1991, Journal of molecular biology.

[100]  P. V. von Hippel,et al.  A thermodynamic analysis of RNA transcript elongation and termination in Escherichia coli. , 1991, Biochemistry.

[101]  M J Sternberg,et al.  Prediction of ATP/GTP-binding motif: a comparison of a perceptron type neural network and a consensus sequence method [corrected]. , 1991, Protein engineering.

[102]  R. Staden,et al.  A sequence assembly and editing program for efficient management of large projects. , 1991, Nucleic acids research.

[103]  E. Gilson,et al.  Palindromic units are part of a new bacterial interspersed mosaic element (BIME). , 1991, Nucleic acids research.

[104]  Webb Miller,et al.  Improved algorithms for searching restriction maps , 1991, Comput. Appl. Biosci..

[105]  S Karlin,et al.  An efficient algorithm for identifying matches with errors in multiple long molecular sequences. , 1991, Journal of molecular biology.

[106]  S Karlin,et al.  Assessment of inhomogeneities in an E. coli physical map. , 1991, Nucleic acids research.

[107]  G. Zhou,et al.  Neural network optimization for E. coli promoter prediction. , 1991, Nucleic acids research.

[108]  A. Danchin,et al.  Escherichia coli molecular genetic map (1500 kbp): update II , 1990, Molecular microbiology.

[109]  M. O'Neill,et al.  Training back-propagation neural networks to define and detect DNA-binding sites. , 1991, Nucleic acids research.

[110]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[111]  T. Smith,et al.  Corruption of genomic databases with anomalous sequence. , 1992, Nucleic acids research.

[112]  J. Collado-Vides,et al.  Grammatical model of the regulation of gene expression. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[113]  R. Staden,et al.  A standard file format for data from DNA sequencing instruments. , 1992, DNA sequence : the journal of DNA sequencing and mapping.

[114]  Peter D. Karp A knowledge base of the chemical compounds of intermediary metabolism , 1992, Comput. Appl. Biosci..

[115]  G. Stormo,et al.  Translation initiation in Escherichia coli: sequences within the ribosome‐binding site , 1992, Molecular microbiology.

[116]  Mark Borodovsky,et al.  First and second moment of counts of words in random texts generated by Markov chains , 1992, Comput. Appl. Biosci..

[117]  H. P. Yockey,et al.  Information Theory And Molecular Biology , 1992 .

[118]  J. Risler,et al.  A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis. , 1992, Nucleic acids research.

[119]  M Kanehisa,et al.  An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. , 1992, Nucleic acids research.

[120]  S Karlin,et al.  Statistical analyses of counts and distributions of restriction sites in DNA sequences. , 1992, Nucleic acids research.

[121]  Peter Salamon,et al.  A Maximum Entropy Principle for the Distribution of Local Complexity in Naturally Occurring Nucleotide Sequences , 1992, Comput. Chem..

[122]  A. Bhagwat,et al.  DNA mismatch correction by Very Short Patch repair may have altered the abundance of oligonucleotides in the E. coli genome. , 1992, Nucleic acids research.

[123]  G. Stormo,et al.  Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. , 1992, Journal of molecular biology.

[124]  R. Merkl,et al.  Statistical evaluation and biological interpretation of non-random abundance in the E. coli K-12 genome of tetra- and pentanucleotide sequences related to VSP DNA mismatch repair. , 1992, Nucleic acids research.

[125]  Changhwan Lee,et al.  Redesigning, implementing and integrating Escherichia coli genome software tools with an object-oriented database system , 1992, Comput. Appl. Biosci..

[126]  S Letovsky,et al.  Genome-related datasets within the E. coli Genetic Stock Center database. , 1992, Nucleic acids research.

[127]  X. Huang,et al.  A contig assembly program based on sensitive detection of fragment overlaps. , 1992, Genomics.

[128]  D. Shub,et al.  Self-splicing introns in tRNA genes of widely divergent bacteria , 1992, Nature.

[129]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[130]  S. Karlin,et al.  Over- and under-representation of short oligonucleotides in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[131]  X. Huang,et al.  Dynamic programming algorithms for restriction map comparison , 1992, Comput. Appl. Biosci..

[132]  Michael C. O'Neill,et al.  Escherichia coli promoters: neural networks develop distinct descriptions in learning to search for promoters of different spacing classes , 1992, Nucleic Acids Res..

[133]  C. Gross,et al.  Polypeptides containing highly conserved regions of transcription initiation factor σ 70 exhibit specificity of binding to promoter DNA , 1992, Cell.

[134]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[135]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[136]  Lars Kai Hansen,et al.  On the Robustness of Maximum Entropy Relationships for Complexity Distributions of Nucleotide Sequences , 1993, Comput. Chem..

[137]  S Karlin,et al.  Significant dispersed recurrent DNA sequences in the Escherichia coli genome. Several new groups. , 1993, Journal of molecular biology.

[138]  Hans-Werner Mewes,et al.  The PIR-International databases , 1993, Nucleic Acids Res..

[139]  Ross A. Overbeek,et al.  The ribosomal database project , 1992, Nucleic Acids Res..

[140]  Peter D. Karp,et al.  Representations of Metabolic Knowledge , 1993, ISMB.

[141]  J. Collado-Vides,et al.  The elements for a classification of units of genetic information with a combinatorial component. , 1993, Journal of theoretical biology.

[142]  G Perrière,et al.  ColiGene: object-centered representation for the study of E coli gene expressivity by sequence analysis. , 1993, Biochimie.

[143]  J Lebbe,et al.  Local predictability in biological sequences, algorithm and applications. , 1993, Biochimie.

[144]  P. Slonimski,et al.  A data‐base of chromosome III of Saccharomyces cerevisiae , 1993, Yeast.

[145]  S Karlin,et al.  Assessments of DNA inhomogeneities in yeast chromosome III. , 1993, Nucleic acids research.

[146]  Thure Etzold,et al.  SRS - an indexing and retrieval tool for flat file data libraries , 1993, Comput. Appl. Biosci..

[147]  O. Gascuel Inductive learning and biological sequence analysis. The PLAGE program. , 1993, Biochimie.

[148]  Aleksandar Milosavljevic,et al.  Discovering simple DNA sequences by the algorithmic significance method , 1993, Comput. Appl. Biosci..

[149]  V. Emilsson,et al.  Growth-rate-dependent accumulation of twelve tRNA species in Escherichia coli. , 1993, Journal of molecular biology.

[150]  Antoine Danchin,et al.  METALGEN.DB: metabolism linked to the genome of Escherichia coli, a graphics-oriented database , 1993, Comput. Appl. Biosci..

[151]  A Danchin,et al.  Colibri: a functional data base for the Escherichia coli genome. , 1993, Microbiological reviews.

[152]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[153]  Lloyd Allison,et al.  Reconstruction of strings past , 1993, Comput. Appl. Biosci..

[154]  Peter D. Karp,et al.  Representations of Metabolic Knowledge: Pathways , 1994, ISMB.

[155]  Alain Hénaut,et al.  A global approach for contig construction , 1994, Comput. Appl. Biosci..

[156]  Jan van Duin,et al.  Translational initiation on structured messengers : another role for the Shine-Dalgarno interaction , 1994 .

[157]  R Wahl,et al.  ECD--a totally integrated database of Escherichia coli K12. , 1994, Nucleic acids research.

[158]  C Lefèvre,et al.  A fast word search algorithm for the representation of sequence similarity in genomic DNA. , 1994, Nucleic acids research.

[159]  T. D. Schneider,et al.  Quantitative analysis of ribosome binding sites in E.coli. , 1994, Nucleic acids research.

[160]  B Butler Nucleic acid sequence analysis software packages. , 1994, Current opinion in biotechnology.

[161]  R. Doolittle Protein sequence comparisons: searching databases and aligning sequences. , 1994, Current opinion in biotechnology.

[162]  F. Lisacek,et al.  Automatic identification of group I intron cores in genomic DNA sequences. , 1994, Journal of molecular biology.

[163]  Stanley Letovsky,et al.  Issues in the development of complex scientific databases , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[164]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[165]  H. Noller,et al.  Footprinting mRNA‐ribosome complexes with chemical probes. , 1994, The EMBO journal.

[166]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[167]  F. Michel,et al.  Multiple group II self-splicing introns in mobile DNA from Escherichia coli. , 1994, Comptes rendus de l'Academie des sciences. Serie III, Sciences de la vie.

[168]  R Harper,et al.  Access to DNA and protein databases on the Internet. , 1994, Current opinion in biotechnology.

[169]  A Danchin,et al.  SubtiList: a relational database for the Bacillus subtilis genome. , 1995, Microbiology.

[170]  R. Wahl,et al.  ECDC--a totally integrated and interactively usable genetic map of Escherichia coli K12. , 1995, Microbiological research.