Fold predictions for bacterial genomes.

Fold assignments for newly sequenced genomes belong to the most important and interesting applications of the booming field of protein structure prediction. We present a brief survey and a discussion of such assignments completed to date, using as an example several fold assignment projects for proteins from the Escherichia coli genome. This review focuses on steps that are necessary to go beyond the simple assignment projects and into the development of tools extending our understanding of functions of proteins in newly sequenced genomes. This paper also discusses several problems seldom addressed in the literature, such as the problem of domain prediction and complementary predictions (e.g., transmembrane regions and flexible regions) and cross-correlation of predictions from different servers. The influence of sequence and structure database growth on prediction success is also addressed. Finally, we discuss the perspectives of the field in the context of massive sequence and structure determination projects, as well as the development of novel prediction methods.

[1]  G Schneider,et al.  Mapping of protein surface cavities and prediction of enzyme class by a self-organizing neural network. , 2000, Protein engineering.

[2]  S. Henikoff,et al.  Drosophila genomic sequence annotation using the BLOCKS+ database. , 2000, Genome research.

[3]  W A Koppensteiner,et al.  Characterization of novel proteins based on known protein structures. , 2000, Journal of molecular biology.

[4]  B. Rost,et al.  Protein structures sustain evolutionary drift. , 1997, Folding & design.

[5]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[6]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[7]  Richard H. Lathrop,et al.  Current Limitations to Protein Threading Approaches , 1997, J. Comput. Biol..

[8]  M J Sternberg,et al.  Supersites within superfolds. Binding site similarity in the absence of homology. , 1998, Journal of molecular biology.

[9]  M. Gerstein Patterns of protein‐fold usage in eight microbial genomes: A comprehensive structural census , 1998, Proteins.

[10]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[11]  A. Godzik,et al.  Functional insights from structural predictions: Analysis of the Escherichia coli genome , 2008, Protein science : a publication of the Protein Society.

[12]  A. Godzik,et al.  Fold and function predictions for Mycoplasma genitalium proteins. , 1998, Folding & design.

[13]  W A Koppensteiner,et al.  An attempt to analyse progress in fold recognition from CASP1 to CASP3 , 1999, Proteins.

[14]  M. Riley Systems for categorizing functions of gene products. , 1998, Current Opinion in Structural Biology.

[15]  C A Smith,et al.  Active site comparisons highlight structural similarities between myosin and other P-loop proteins. , 1996, Biophysical journal.

[16]  Roland L. Dunbrack,et al.  Genomic Fold Assignment and Rational Modeling of Proteins of Biological Interest , 2000, ISMB.

[17]  R. King,et al.  Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining , 2000, Yeast.

[18]  Obradovic,et al.  Predicting Protein Disorder for N-, C-, and Internal Regions. , 1999, Genome informatics. Workshop on Genome Informatics.

[19]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[20]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[21]  Monica Riley,et al.  Genes and proteins of Escherichia coli (GenProtEc) , 1996, Nucleic Acids Res..

[22]  Stephen H. Bryant,et al.  Domain size distributions can predict domain boundaries , 2000, Bioinform..

[23]  J M Thornton,et al.  Using the CATH domain database to assign structures and functions to the genome sequences. , 2000, Biochemical Society transactions.

[24]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[25]  P D Karp,et al.  What we do not know about sequence analysis and sequence databases. , 1998, Bioinformatics.

[26]  M. Helmer-Citterich,et al.  Three-dimensional profiles: a new tool to identify protein surface similarities. , 1998, Journal of molecular biology.

[27]  D Fischer,et al.  Predicting structures for genome proteins. , 1999, Current opinion in structural biology.

[28]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[29]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[30]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[31]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[32]  L Rychlewski,et al.  From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions , 1999, Protein science : a publication of the Protein Society.

[33]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[34]  A Bairoch,et al.  Protein annotation: detective work for function prediction. , 1998, Trends in genetics : TIG.

[35]  D. T. Jones Protein structure prediction in the postgenomic era. , 2000, Current opinion in structural biology.

[36]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[37]  S H Kim,et al.  Assignment of folds for proteins of unknown function in three microbial genomes. , 1998, Microbial & comparative genomics.

[38]  J Skolnick,et al.  Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. , 1998, Journal of molecular biology.

[39]  F E Cohen,et al.  Protein misfolding and prion diseases. , 1999, Journal of molecular biology.

[40]  Michael Y. Galperin,et al.  Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs) , 2000, Genome Biology.

[41]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[42]  Michael Y. Galperin,et al.  Analogous enzymes: independent inventions in enzyme evolution. , 1998, Genome research.

[43]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[44]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[45]  A. Sali,et al.  Protein structure modeling for structural genomics , 2000, Nature Structural Biology.

[46]  A. Godzik,et al.  Sensitive sequence comparison as protein function predictor. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[47]  David S. Eisenberg,et al.  Finding families for genomic ORFans , 1999, Bioinform..

[48]  C. Orengo,et al.  Protein folds and functions. , 1998, Structure.

[49]  Amos Bairoch,et al.  The ENZYME data bank in 1999 , 1999, Nucleic Acids Res..

[50]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[51]  K Wüthrich,et al.  NMR solution structure of the human prion protein. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[52]  E V Koonin,et al.  Protein fold recognition using sequence profiles and its application in structural genomics. , 2000, Advances in protein chemistry.

[53]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[54]  G J Barton,et al.  Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. , 1994, Journal of molecular biology.

[55]  M Gerstein,et al.  Advances in structural genomics. , 1999, Current opinion in structural biology.

[56]  Golan Yona,et al.  Towards a Complete Map of the Protein Space Based on a Unified Sequence and Structure Analysis of All Known Proteins , 2000, ISMB.

[57]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[58]  Yan P. Yuan,et al.  Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. , 2000, Nucleic acids research.

[59]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[60]  R. Russell,et al.  Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. , 1998, Journal of molecular biology.

[61]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[62]  D Fischer,et al.  The 2000 Olympic Games of protein structure prediction; fully automated programs are being evaluated vis-à-vis human teams in the protein structure prediction experiment CAFASP2. , 2000, Protein engineering.

[63]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[64]  Fan Yang,et al.  Crystal structure of Escherichia coli HdeA , 1998, Nature Structural Biology.

[65]  Leszek Rychlewski,et al.  Improving the quality of twilight‐zone alignments , 2000, Protein science : a publication of the Protein Society.

[66]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[67]  J Moult,et al.  From fold to function. , 2000, Current opinion in structural biology.

[68]  P Bork,et al.  Homology-based fold predictions for Mycoplasma genitalium proteins. , 1998, Journal of molecular biology.

[69]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[70]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[71]  H. Kessler,et al.  The solution structure of VAT-N reveals a ‘missing link’ in the evolution of complex enzymes from a simple βαββ element , 1999, Current Biology.

[72]  D Fischer,et al.  CAFASP‐1: Critical assessment of fully automated structure prediction methods , 1999, Proteins.

[73]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[74]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[75]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[76]  C. Ouzounis,et al.  Recent developments and future directions in computational genomics , 2000, FEBS letters.

[77]  C. Chothia,et al.  Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[78]  Monica Riley,et al.  Genes and proteins of Escherichia coli K-12 , 1998, Nucleic Acids Res..