Rigorous performance evaluation in protein structure modelling and implications for computational biology

In principle, given the amino acid sequence of a protein, it is possible to compute the corresponding three-dimensional structure. Methods for modelling structure based on this premise have been under development for more than 40 years. For the past decade, a series of community wide experiments (termed Critical Assessment of Structure Prediction (CASP)) have assessed the state of the art, providing a detailed picture of what has been achieved in the field, where we are making progress, and what major problems remain. The rigorous evaluation procedures of CASP have been accompanied by substantial progress. Lessons from this area of computational biology suggest a set of principles for increasing rigor in the field as a whole.

[1]  O. Schueler‐Furman,et al.  Progress in Modeling of Protein Structures and Interactions , 2005, Science.

[2]  Burkhard Rost,et al.  Improving fold recognition without folds. , 2004, Journal of molecular biology.

[3]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[4]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[5]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[6]  Liam J. McGuffin,et al.  Improvement of the GenTHREADER Method for Genomic Fold Recognition , 2003, Bioinform..

[7]  Krzysztof Fidelis,et al.  Processing and evaluation of predictions in CASP4 , 2001, Proteins.

[8]  K. Musier-Forsyth,et al.  Trans-editing of Cys-tRNAPro by Haemophilus influenzae YbaK Protein* , 2004, Journal of Biological Chemistry.

[9]  Jakub Pas,et al.  Application of 3D‐Jury, GRDB, and Verify3D in fold recognition , 2003, Proteins.

[10]  John Moult,et al.  Molecular modeling of protein function regions , 2004, Proteins.

[11]  Liam J McGuffin,et al.  Assembling novel protein folds from super‐secondary structural fragments , 2003, Proteins.

[12]  M J Sternberg,et al.  Enhancement of protein modeling by human intervention in applying the automatic programs 3D‐JIGSAW and 3D‐PSSM , 2001, Proteins.

[13]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[14]  N. Grishin,et al.  Gaps in structurally similar proteins: Towards improvement of multiple sequence alignment , 2003, Proteins.

[15]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Guang R. Gao,et al.  Quasi-consensus-based comparison of profile hidden Markov models for protein sequences , 2005, Bioinform..

[18]  Ceslovas Venclovas,et al.  Assessment of progress over the CASP experiments , 2003, Proteins.

[19]  Leszek Rychlewski,et al.  LiveBench‐8: The large‐scale, continuous assessment of automated protein structure prediction , 2005, Protein science : a publication of the Protein Society.

[20]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[21]  S. Subbiah,et al.  The use of side‐chain packing methods in modeling bacteriophage repressor and cro proteins , 1995, Protein science : a publication of the Protein Society.

[22]  Ceslovas Venclovas,et al.  Comparative modeling in CASP5: Progress is evident, but alignment errors remain a significant hindrance , 2003, Proteins.

[23]  J. Janin Assessing predictions of protein–protein interaction: The CAPRI experiment , 2005, Protein science : a publication of the Protein Society.

[24]  Anna Tramontano,et al.  Assessment of homology‐based predictions in CASP5 , 2003, Proteins.

[25]  J. Skolnick,et al.  Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement , 2001, Proteins.

[26]  Patrick Aloy,et al.  Predictions without templates: New folds, secondary structure, and contacts in CASP5 , 2003, Proteins.

[27]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[28]  Kevin Karplus,et al.  Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set , 2001, Bioinform..

[29]  H B Broughton,et al.  Molecular modeling. , 2020, Current opinion in chemical biology.

[30]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[31]  D Baker,et al.  Local sequence-structure correlations in proteins. , 1996, Current opinion in biotechnology.

[32]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[33]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[34]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[35]  Lisa N Kinch,et al.  CASP5 assessment of fold recognition target predictions , 2003, Proteins.

[36]  Christopher Bystroff,et al.  Five Hierarchical Levels of Sequence-Structure Correlation in Proteins , 2004, Applied bioinformatics.

[37]  Ronald M Levy,et al.  Have we seen all structures corresponding to short protein fragments in the Protein Data Bank? An update. , 2003, Protein engineering.