Genome-based structural biology.

Spectacular achievements in whole genome sequencing open up new possibilities for structural research. Protein structures can now be studied in their natural genomic context. On the other hand, structure prediction algorithms can be improved using species-specific tendencies in folding patterns. Finally, efficient strategies to select targets for structure determination can be devised. In this review we consider new computational approaches and results in protein structure analysis stemming from the availability of complete genomes.

[1]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[2]  C. Sander,et al.  Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III , 1992, Protein science : a publication of the Protein Society.

[3]  G. Vonheijne The signal peptide. , 1990 .

[4]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[5]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. Goffeau,et al.  How many yeast genes code for membrane‐spanning proteins? , 1993, Yeast.

[7]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[8]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[9]  Simon Kasif,et al.  Computational methods in molecular biology , 1998 .

[10]  G. Allen,et al.  Protein : a comprehensive treatise , 1997 .

[11]  J. Greer,et al.  Comparative modeling of homologous proteins. , 1991, Methods in enzymology.

[12]  J. Lobry,et al.  Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. , 1997, Gene.

[13]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[14]  R. Abagyan,et al.  Recognition of distantly related proteins through energy calculations , 1994, Proteins.

[15]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[16]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[17]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[18]  Eugene V. Koonin,et al.  [18] Protein sequence comparison at genome scale , 1996 .

[19]  B. Rost,et al.  Transmembrane helices predicted at 95% accuracy , 1995, Protein science : a publication of the Protein Society.

[20]  Janet M. Thornton,et al.  Modelling by homology , 1991 .

[21]  P Argos,et al.  Prediction of transmembrane segments in proteins utilising multiple sequence alignments. , 1994, Journal of molecular biology.

[22]  J. Newman,et al.  Class‐directed structure determination: Foundation for a protein structure initiative , 1998, Protein science : a publication of the Protein Society.

[23]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[24]  A Elofsson,et al.  Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. , 1997, Protein engineering.

[25]  T. Traut,et al.  A minimal gene set for cellular life derived by comparison of complete bacterial genomes , 1998 .

[26]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[27]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[28]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[29]  W. C. Barker,et al.  The PIR-International Protein Sequence Database. , 1998, Nucleic acids research.

[30]  E. Koonin,et al.  Escherichia Coli — Functional and Evolutionary Implications of Genome Scale Computer-Aided Protein Sequence Analysis , 1996 .

[31]  C. Orengo,et al.  Protein folds and functions. , 1998, Structure.

[32]  W R Taylor,et al.  A model recognition approach to the prediction of all-helical membrane protein structure and topology. , 1994, Biochemistry.

[33]  A. Godzik,et al.  Fold and function predictions for Mycoplasma genitalium proteins. , 1998, Folding & design.

[34]  Arne Elofsson,et al.  Architecture of β‐barrel membrane proteins: Analysis of trimeric porins , 1998 .

[35]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[36]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Michael Y. Galperin,et al.  Prokaryotic genomes: the emerging paradigm of genome-based microbiology. , 1997, Current opinion in genetics & development.

[38]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[39]  A Grigoriev,et al.  Analyzing genomes with cumulative skew diagrams. , 1998, Nucleic acids research.

[40]  W. Hendrickson Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. , 1991, Science.

[41]  H. Mewes,et al.  Protein structural classes in five complete genomes , 1997, Nature Structural Biology.

[42]  S Karlin,et al.  Compositional biases of bacterial genomes and evolutionary implications , 1997, Journal of bacteriology.

[43]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[44]  Jaap Heringa,et al.  OBSTRUCT: a program to obtain largest cliques from a protein sequence set according to structural resolution and sequence similarity , 1992, Comput. Appl. Biosci..

[45]  Hans-Werner Mewes,et al.  The PIR-International Protein Sequence Database , 1992, Nucleic Acids Res..

[46]  Michael Y. Galperin,et al.  Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement, and operon disruption , 1998, Silico Biol..

[47]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[48]  Dmitrij Frishman,et al.  11 - Towards Automated Prediction of Protein Function from Microbial Genomic Sequences , 1999 .

[49]  L Shapiro,et al.  The Argonne Structural Genomics Workshop: Lamaze class for the birth of a new science. , 1998, Structure.

[50]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[51]  T L Blundell,et al.  CAMPASS: a database of structurally aligned protein superfamilies. , 1998, Structure.

[52]  J. Beckwith,et al.  How many membrane proteins are there? , 1998, Protein science : a publication of the Protein Society.

[53]  Hans-Werner Mewes,et al.  the yeast genome , 1997 .

[54]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[55]  M Gerstein,et al.  A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. , 1997, Journal of molecular biology.

[56]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[57]  Manuel G. Claros,et al.  TopPred II: an improved software for membrane protein structure predictions , 1994, Comput. Appl. Biosci..

[58]  P Bork,et al.  Homology-based fold predictions for Mycoplasma genitalium proteins. , 1998, Journal of molecular biology.

[59]  G. Heijne The signal peptide , 2005, The Journal of Membrane Biology.

[60]  Terry Gaasterland,et al.  Structural genomics: Bioinformatics in the driver's seat , 1998, Nature Biotechnology.

[61]  J. Peden,et al.  Simple sequence repeats in the Helicobacter pylori genome , 1998, Molecular microbiology.

[62]  S Rackovsky On the nature of the protein folding code. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Jaap Heringa,et al.  Chapter 4 Computational methods relating protein sequence and structure , 1997 .

[64]  M. Levitt,et al.  A structural census of the current population of protein sequences. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[65]  S. Oliver,et al.  Erratum: Overview of the yeast genome , 1997, Nature.

[66]  T L Blundell,et al.  Phylogenetic relationships from three-dimensional protein structures. , 1990, Methods in enzymology.

[67]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[68]  M. Borodovsky,et al.  Detection of new genes in a bacterial genome using Markov models for three gene classes. , 1995, Nucleic acids research.

[69]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[70]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[71]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[72]  R D Appel,et al.  Large‐scale protein modelling and integration with the SWISS‐PROT and SWISS‐2DPAGE databases: The example of Escherichia coli , 1997, Electrophoresis.

[73]  A T Brünger,et al.  Are there dominant membrane protein families with a given number of helices? , 1997, Proteins.

[74]  C DeLisi,et al.  The detection and classification of membrane-spanning proteins. , 1985, Biochimica et biophysica acta.

[75]  J. Rosenbusch,et al.  Prokaryotic and eukaryotic porins , 1991 .

[76]  Dmitrij Frishman,et al.  PEDANTic genome analysis , 1997 .

[77]  Manuel Peitsch,et al.  A genome-based approach for the identification of essential bacterial genes , 1998, Nature Biotechnology.

[78]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[79]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[80]  André Goffeau,et al.  The yeast genome directory. , 1997, Nature.

[81]  Anders Krogh,et al.  Prediction of Signal Peptides and Signal Anchors by a Hidden Markov Model , 1998, ISMB.

[82]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[83]  S Karlin,et al.  Significant dispersed recurrent DNA sequences in the Escherichia coli genome. Several new groups. , 1993, Journal of molecular biology.

[84]  David T. Jones Do transmembrane protein superfolds exist? , 1998, FEBS letters.