Practical lessons from protein structure prediction

Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.

[1]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M. Levitt,et al.  Computer simulation of protein folding , 1975, Nature.

[3]  P Argos,et al.  Residue contacts in protein structures and implications for protein folding. , 2009, International journal of peptide and protein research.

[4]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[5]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[6]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[7]  T. Jones,et al.  Between objectivity and subjectivity , 1990, Nature.

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[10]  T L Blundell,et al.  Comparative analysis of protein three-dimensional structures and an approach to the inverse folding problem. , 1991, Ciba Foundation symposium.

[11]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[13]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[14]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[15]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[16]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[17]  G J Kleywegt,et al.  Where freedom is given, liberties are taken. , 1995, Structure.

[18]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[19]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[20]  Gottfried Otting,et al.  Saposin fold revealed by the NMR structure of NK-lysin , 1997, Nature Structural Biology.

[21]  P. Kolattukudy,et al.  Methylmalonyl coenzyme A selectivity of cloned and expressed acyltransferase and beta-ketoacyl synthase domains of mycocerosic acid synthase from Mycobacterium bovis BCG , 1997, Journal of bacteriology.

[22]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP): Round III , 1999, Proteins.

[23]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[24]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[25]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[26]  J Skolnick,et al.  Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. , 1998, Journal of molecular biology.

[27]  A. Godzik,et al.  Fold and function predictions for Mycoplasma genitalium proteins. , 1998, Folding & design.

[28]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[29]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[30]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[32]  C. Orengo,et al.  Protein folds and functions. , 1998, Structure.

[33]  Sung-Hou Kim Shining a light on structural genomics , 1998, Nature Structural Biology.

[34]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[35]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[36]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[37]  K Karplus,et al.  Predicting protein structure using only sequence information , 1999, Proteins.

[38]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[39]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[40]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[41]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.

[42]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[44]  D Fischer,et al.  CAFASP‐1: Critical assessment of fully automated structure prediction methods , 1999, Proteins.

[45]  J. Skolnick,et al.  Ab initio folding of proteins using restraints derived from evolutionary information , 1999, Proteins.

[46]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[47]  Y Wang,et al.  Links from genome proteins to known 3-D structures. , 2000, Genome research.

[48]  L. Shapiro,et al.  Finding function through structural genomics. , 2000, Current opinion in biotechnology.

[49]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[50]  V. Uversky Intrinsically Disordered Proteins , 2000 .

[51]  C. Sander,et al.  Genome sequences and great expectations , 2000, Genome Biology.

[52]  P Bork,et al.  Gene context conservation of a higher order than operons. , 2000, Trends in biochemical sciences.

[53]  M. Maqueda,et al.  Bacteriocin AS-48, a microbial cyclic polypeptide structurally and functionally related to mammalian NK-lysin. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[55]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[56]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[57]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[58]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[59]  Arne Elofsson,et al.  A study of quality measures for protein threading models , 2001, BMC Bioinformatics.

[60]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[61]  A. Valencia,et al.  Intrinsic errors in genome annotation. , 2001, Trends in genetics : TIG.

[62]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[63]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[64]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[65]  Roland L. Dunbrack,et al.  CAFASP2: The second critical assessment of fully automated structure prediction methods , 2001, Proteins.

[66]  Liisa Holm,et al.  Picasso: generating a covering set of protein family profiles , 2001, Bioinform..

[67]  Richard Bonneau,et al.  Functional inferences from blind ab initio protein structure predictions. , 2001, Journal of structural biology.

[68]  D Fischer,et al.  LiveBench‐1: Continuous benchmarking of protein structure prediction servers , 2001, Protein science : a publication of the Protein Society.

[69]  M. Schumacher,et al.  Crystal structures of SarA, a pleiotropic regulator of virulence genes in S. aureus , 2001, Nature.

[70]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2003, Proteins.

[71]  D Fischer,et al.  LiveBench‐2: Large‐scale automated evaluation of protein structure prediction servers , 2001, Proteins.

[72]  J. Bujnicki,et al.  Identification of a PD-(D/E)XK-like domain with a novel configuration of the endonuclease active site in the methyl-directed restriction enzyme Mrr and its homologs. , 2001, Gene.

[73]  M J Sternberg,et al.  Enhancement of protein modeling by human intervention in applying the automatic programs 3D‐JIGSAW and 3D‐PSSM , 2001, Proteins.

[74]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[75]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[76]  D. T. Jones,et al.  Evaluating the potential of using fold-recognition models for molecular replacement. , 2001, Acta crystallographica. Section D, Biological crystallography.

[77]  K Karplus,et al.  What is the value added by human intervention in protein structure prediction? , 2001, Proteins.

[78]  B Rost,et al.  EVA: Large‐scale analysis of secondary structure prediction , 2001, Proteins.

[79]  A G Murzin,et al.  CASP2 knowledge‐based approach to distant homology recognition and fold prediction in CASP4 , 2001, Proteins.

[80]  Arne Elofsson,et al.  Structure prediction meta server , 2001, Bioinform..

[81]  Richard Bonneau,et al.  De novo prediction of three-dimensional structures for major protein families. , 2002, Journal of molecular biology.

[82]  D. Baker,et al.  De novo determination of protein backbone structure from residual dipolar couplings using Rosetta. , 2002, Journal of the American Chemical Society.

[83]  Leszek Rychlewski,et al.  Fold-recognition detects an error in the Protein Data Bank , 2002, Bioinform..

[84]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[85]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[86]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[87]  D. Fischer,et al.  LiveBench‐6: Large‐scale automated evaluation of protein structure prediction servers , 2003, Proteins.

[88]  L. Hood,et al.  The digital code of DNA , 2003, Nature.

[89]  Jun Zhu,et al.  BioMagResBank database with sets of experimental NMR constraints corresponding to the structures of over 1400 biomolecules deposited in the Protein Data Bank , 2003, Journal of biomolecular NMR.

[90]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[91]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[92]  Ralf Zimmer,et al.  Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction , 2002, Pacific Symposium on Biocomputing.

[93]  Anna R Panchenko,et al.  Finding weak similarities between proteins by sequence profile comparison. , 2003, Nucleic acids research.

[94]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[95]  Dong Xu,et al.  PROSPECT II: protein structure prediction program for genome-scale applications. , 2003, Protein engineering.

[96]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[97]  Ram Samudrala,et al.  PROTINFO: secondary and tertiary protein structure prediction , 2003, Nucleic Acids Res..

[98]  Lars Malmström,et al.  Automated prediction of CASP‐5 structures using the Robetta server , 2003, Proteins.

[99]  J. Skolnick,et al.  TOUCHSTONEX: Protein structure prediction with sparse NMR data , 2003, Proteins.

[100]  D. Baker,et al.  A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. , 2003, Journal of molecular biology.

[101]  S. Wodak,et al.  Assessment of blind predictions of protein–protein interactions: Current status of docking methods , 2003, Proteins.

[102]  Ying Xu,et al.  Protein Threading by Linear Programming , 2003, Pacific Symposium on Biocomputing.

[103]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[104]  John B. Anderson,et al.  CDD: a curated Entrez database of conserved domain alignments , 2003, Nucleic Acids Res..

[105]  D. Baker,et al.  Deciphering a novel thioredoxin‐like fold family , 2003, Proteins.

[106]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[107]  Marcin von Grotthuss,et al.  ORFeus: detection of distant homology using sequence profiles and predicted secondary structure , 2003, Nucleic Acids Res..

[108]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[109]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[110]  Daisuke Kihara,et al.  TOUCHSTONE: A unified approach to protein structure prediction , 2003, Proteins.

[111]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[112]  Marc A. Martí-Renom,et al.  EVA: evaluation of protein structure prediction servers , 2003, Nucleic Acids Res..

[113]  Yutaka Akiyama,et al.  FORTE: a profile-profile comparison tool for protein fold recognition , 2004, Bioinform..

[114]  Yuichi Harano,et al.  Complete protein structure determination using backbone residual dipolar couplings and sidechain rotamer prediction , 2004, Journal of Structural and Functional Genomics.

[115]  Marcin von Grotthuss,et al.  Detecting distant homology with Meta-BASIC , 2004, Nucleic Acids Res..

[116]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.