Bioinformatics: from genome data to biological knowledge.

Recently, molecular biologists have sequenced about a dozen bacterial genomes and the first eukaryotic genome. We can now obtain answers to detailed questions about the complete set of genes of an organism. Bioinformatics methods are increasingly used for attaching biological knowledge to long lists of genes, assigning genes to biological pathways, comparing the gene sets of different species, identifying specificity factors, and describing sets of highly conserved proteins common to all domains of life. Substantial progress has recently been made in the availability of primary and added-value databases, in the development of algorithms and of network information services for genome analysis. The pharmaceutical industry has greatly benefited from the accumulation of sequence data through the identification of targets and candidates for the development of drugs, vaccines, diagnostic markers and therapeutic proteins.

[1]  T. Traut,et al.  A minimal gene set for cellular life derived by comparison of complete bacterial genomes , 1998 .

[2]  P. Bork,et al.  Non-orthologous gene displacement. , 1996, Trends in genetics : TIG.

[3]  Michael Y. Galperin,et al.  Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea , 1997, Molecular microbiology.

[4]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[5]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[6]  P Bork,et al.  Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[7]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[8]  B. Dujon,et al.  Complete transcriptional map of yeast chromosome XI in different life conditions. , 1997, Journal of molecular biology.

[9]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[10]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[11]  C Ouzounis,et al.  Novelties from the complete genome of Mycoplasma genitalium , 1996, Molecular microbiology.

[12]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[13]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[14]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[15]  Douglas E. Bassett,et al.  Yeast genes and human disease , 1996, Nature.

[16]  C. Sander,et al.  Genequiz II: Automatic Function Assignment For Genome Sequence Analysis , 1996 .

[17]  F. Corpet,et al.  Graphical interface for ProDom domain families. , 1996, Trends in biochemical sciences.

[18]  Rodrigo Lopez,et al.  The EMBL Nucleotide Sequence Database , 1999, Nucleic Acids Res..

[19]  C Sander,et al.  Bioinformatics and the discovery of gene function. , 1996, Trends in genetics : TIG.

[20]  Monica Riley,et al.  Genes and proteins of Escherichia coli K-12 (GenProtEC) , 1997, Nucleic Acids Res..

[21]  W. Pearson Effective protein sequence comparison. , 1996, Methods in enzymology.

[22]  Stanley Letovsky,et al.  The GDB Human Genome Database Anno 1997 , 1997, Nucleic Acids Res..

[23]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[24]  Cathy H. Wu,et al.  The PIR-International Protein Sequence Database , 1999, Nucleic Acids Res..

[25]  Temple F. Smith,et al.  Biology's new Rosetta stone , 1997, Nature.

[26]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[27]  A. Lupas A circular permutation event in the evolution of the SLH domain? , 1996, Molecular microbiology.

[28]  C. Sander,et al.  The HSSP database of protein structure-sequence alignments. , 1994, Nucleic acids research.

[29]  C. Sander,et al.  Challenging times for bioinformatics , 1995, Nature.

[30]  E. Uberbacher,et al.  Discovering and understanding genes in human DNA sequence using GRAIL. , 1996, Methods in enzymology.

[31]  A. Valencia,et al.  Conserved Clusters of Functionally Related Genes in Two Bacterial Genomes , 1997, Journal of Molecular Evolution.

[32]  D. Sankoff,et al.  Parametric genome rearrangement. , 1996, Gene.

[33]  Peter D. Karp,et al.  HinCyc: A Knowledge Base of the Complete Genome and Metabolic Pathways of H. influenzae , 1996, ISMB.

[34]  Miguel A. Andrade-Navarro,et al.  Classification of protein families and detection of the determinant residues with an improved self-organizing map , 1997, Biological Cybernetics.

[35]  K. H. Fasman,et al.  The GDB Human Genome Data Base anno 1994. , 1994, Nucleic acids research.

[36]  C. Sander,et al.  The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value , 1996, Comput. Appl. Biosci..

[37]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[38]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[40]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[41]  P. Bucher,et al.  Searching for regulatory elements in human noncoding sequences. , 1997, Current opinion in structural biology.

[42]  T Gaasterland,et al.  Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. , 1996, Biochimie.

[43]  C. Sander,et al.  Computational comparisons of model genomes. , 1996, Trends in biotechnology.

[44]  R Sánchez,et al.  Advances in comparative protein-structure modelling. , 1997, Current opinion in structural biology.

[45]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[46]  P. Pevzner,et al.  Gene recognition via spliced sequence alignment. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[47]  H. Hilbert,et al.  Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. , 1996, Nucleic acids research.

[48]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[49]  Artemis G. Hatzigeorgiou,et al.  Computational analysis of transcriptional regulatory elements: a field in flux , 1996, Comput. Appl. Biosci..

[50]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[51]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[52]  Peter D. Karp,et al.  EcoCyc: Enyclopedia of Escherichia coli Genes and Metabolism , 1997, Nucleic Acids Res..

[53]  S. Henikoff,et al.  Blocks database and its applications. , 1996, Methods in enzymology.

[54]  C Ouzounis,et al.  The emergence of major cellular processes in evolution , 1996, FEBS letters.

[55]  Y. Nakamura,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[56]  A. Murzin,et al.  Aerolysin and pertussis toxin share a common receptor‐binding domain , 1997, The EMBO journal.

[57]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[58]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[59]  Minoru Kanehisa,et al.  Toward Pathway Engineering: A New Database of Genetic and Molecular Pathways , 1997 .

[60]  A. Kolstø,et al.  Dynamic bacterial genome organization , 1997, Molecular microbiology.

[61]  Sayaka,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[62]  S Falkow,et al.  Microbial pathogenesis: genomics and beyond. , 1997, Science.

[63]  H. Hilbert,et al.  Comparative analysis of the genomes of the bacteria Mycoplasma pneumoniae and Mycoplasma genitalium. , 1997, Nucleic acids research.

[64]  M. Mann,et al.  Developments in matrix-assisted laser desorption/ionization peptide mass spectrometry. , 1996, Current opinion in biotechnology.

[65]  M C Peitsch,et al.  ProMod and Swiss-Model: Internet-based tools for automated comparative protein modelling. , 1996, Biochemical Society transactions.

[66]  Robert S. Ledley,et al.  The Protein Information Resource (PIR) and the PIR-International Protein Sequence Database , 1997, Nucleic Acids Res..

[67]  G J Barton,et al.  Identification of functional residues and secondary structure from protein multiple sequence alignment. , 1996, Methods in enzymology.

[68]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[69]  Miguel A. Andrade-Navarro,et al.  Sequence analysis of the Methanococcus jannaschii genome and the prediction of protein function , 1997, Comput. Appl. Biosci..

[70]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[71]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[72]  A. Danchin,et al.  Comparison between the Escherichia coli and Bacillus subtilis genomes suggests that a major function of polynucleotide phosphorylase is to synthesize CDP. , 1997, DNA research : an international journal for rapid publication of reports on genes and genomes.

[73]  Peter D. Karp,et al.  EcoCyc: Encyclopedia of Escherichia coli genes and metabolism , 1998, Nucleic Acids Res..

[74]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[75]  C Sander,et al.  The use of position‐specific rotamers in model building by homology , 1995, Proteins.

[76]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[77]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[78]  Dmitrij Frishman,et al.  PEDANTic genome analysis , 1997 .

[79]  Monica Riley,et al.  Genes and proteins of Escherichia coli (GenProtEc) , 1996, Nucleic Acids Res..