Evaluation of protein structure prediction methods: Issues and strategies

The Internet is swarmed with tools for predicting protein structure from sequence, and it also provides access to databases of protein three-dimensional models. This wealth of methods and repositories can be very useful to design experiments and interpret their results, as testified by several examples in the literature. On the other side, however, life scientists need to select the most appropriate resource for their problem of interest. The structural bioinformatics community has devised worldwide initiatives – which are described in this chapter – to objectively monitor the state of the art in the field. The challenges in assessing the accuracy of structural models, in comparing different approaches, and in detecting and measuring the extent of progress over time will be discussed here together with some of the solutions adopted by the community. Finally, we will briefly describe a few examples of protein structure analysis and prediction that have been instrumental in shedding light on relevant biomedical problems.

[1]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[2]  Krzysztof Fidelis,et al.  Progress from CASP6 to CASP7 , 2007, Proteins.

[3]  J. Thornton,et al.  AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR , 1996, Journal of biomolecular NMR.

[4]  Anna Tramontano,et al.  The assessment of methods for protein structure prediction. , 2008, Methods in molecular biology.

[5]  Iakes Ezkurdia,et al.  Target domain definition and classification in CASP8 , 2009, Proteins.

[6]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.

[7]  Sandor Vajda,et al.  CAPRI: A Critical Assessment of PRedicted Interactions , 2003, Proteins.

[8]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[9]  T L Blundell,et al.  Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. , 1987, Protein engineering.

[10]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[11]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[12]  D Baker,et al.  Local sequence-structure correlations in proteins. , 1996, Current opinion in biotechnology.

[13]  Steven E Brenner,et al.  The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[14]  K. Fidelis,et al.  Protein structure prediction and model quality assessment. , 2009, Drug discovery today.

[15]  Krzysztof Fidelis,et al.  Protein structure prediction center in CASP8 , 2009, Proteins.

[16]  Arne Elofsson,et al.  Prediction of global and local model quality in CASP7 using Pcons and ProQ , 2007, Proteins.

[17]  Liam J. McGuffin Prediction of global and local model quality in CASP8 using the ModFOLD server , 2009, Proteins.

[18]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[19]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[20]  Arne Elofsson,et al.  Automatic consensus‐based fold recognition using Pcons, ProQ, and Pmodeller , 2003, Proteins.

[21]  John Orban,et al.  NMR structures of two designed proteins with high sequence identity but different fold and function , 2008, Proceedings of the National Academy of Sciences.

[22]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[23]  T. P. Flores,et al.  Comparison of conformational characteristics in structurally similar protein pairs , 1993, Protein science : a publication of the Protein Society.

[24]  Andreas Prlic,et al.  New tools and expanded data analysis capabilities at the protein structure prediction center , 2007, Proteins.

[25]  Krzysztof Fidelis,et al.  CASP8 results in context of previous experiments , 2009, Proteins.

[26]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[27]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[28]  Kevin Karplus,et al.  Applying Undertaker to quality assessment , 2009, Proteins.

[29]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[30]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[31]  Jinfeng Liu,et al.  Novel leverage of structural genomics , 2007, Nature Biotechnology.

[32]  Nick V Grishin,et al.  Discrete-continuous duality of protein structure space. , 2009, Current opinion in structural biology.

[33]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[34]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Anna Tramontano,et al.  Assessment of predictions in the model quality assessment category , 2007, Proteins.

[36]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[37]  D Baker,et al.  Global properties of the mapping between local amino acid sequence and local structure in proteins. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Roland L Dunbrack,et al.  Assessment of fold recognition predictions in CASP6 , 2005, Proteins.

[39]  C. Sander,et al.  Errors in protein structures , 1996, Nature.

[40]  Krzysztof Fidelis,et al.  Processing and evaluation of predictions in CASP4 , 2001, Proteins.

[41]  T J Hubbard RMS/Coverage graphs: A qualitative method for comparing three‐dimensional protein structure predictions , 1999, Proteins.

[42]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[43]  C Venclovas,et al.  Comparison of performance in successive CASP experiments , 2001, Proteins.

[44]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[45]  Manfred J Sippl,et al.  Fold space unlimited. , 2009, Current opinion in structural biology.

[46]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[47]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[48]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.

[49]  Arne Elofsson,et al.  Assessment of global and local model quality in CASP8 using Pcons and ProQ , 2009, Proteins.

[50]  Anna Tramontano,et al.  A model of the complex between the PfEMP1 malaria protein and the human ICAM‐1 receptor , 2007, Proteins.

[51]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[52]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[53]  R Leplae,et al.  Analysis and assessment of comparative modeling predictions in CASP4 , 2001, Proteins.

[54]  Torsten Schwede,et al.  Assessment of CASP7 predictions for template‐based modeling targets , 2007, Proteins.

[55]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[56]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[57]  Lisa N Kinch,et al.  CASP5 assessment of fold recognition target predictions , 2003, Proteins.

[58]  A. Tramontano,et al.  A zinc binding site in viral serine proteinases. , 1996, Biochemistry.

[59]  Torsten Schwede,et al.  The SWISS-MODEL Repository and associated resources , 2008, Nucleic Acids Res..

[60]  Christopher J. Williams,et al.  The other 90% of the protein: Assessment beyond the Cαs for CASP8 template‐based and high‐accuracy models , 2009, Proteins.

[61]  Hagen Blankenburg,et al.  The implications of alternative splicing in the ENCODE protein complement , 2007, Proceedings of the National Academy of Sciences.

[62]  András Fiser,et al.  Effects of amino acid composition, finite size of proteins, and sparse statistics on distance‐dependent statistical pair potentials , 2007, Proteins.

[63]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[64]  Jeffrey J. Gray,et al.  Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. , 2003, Journal of molecular biology.

[65]  Anna Tramontano,et al.  The PMDB Protein Model Database , 2005, Nucleic Acids Res..

[66]  Burkhard Rost,et al.  Evaluation of template‐based models in CASP8 with standard measures , 2009, Proteins.

[67]  C Venclovas,et al.  Criteria for evaluating protein structures derived from comparative modeling , 1997, Proteins.

[68]  D. Cozzetto,et al.  Advances and pitfalls in protein structure prediction. , 2008, Current protein & peptide science.

[69]  Randy J Read,et al.  Assessment of CASP7 predictions in the high accuracy template‐based modeling category , 2007, Proteins.

[70]  D. Cozzetto,et al.  Relationship between multiple sequence alignments and quality of protein comparative models , 2004, Proteins.

[71]  Simon I Hay,et al.  Defining the relationship between Plasmodium falciparum parasite rate and clinical disease: statistical models for disease burden estimation , 2009, Malaria Journal.

[72]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[73]  Anna Tramontano,et al.  Assessment of homology‐based predictions in CASP5 , 2003, Proteins.

[74]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[75]  M. James,et al.  A critical assessment of comparative molecular modeling of tertiary structures of proteins * , 1995, Proteins.

[76]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[77]  Ceslovas Venclovas,et al.  Progress over the first decade of CASP experiments , 2005, Proteins.

[78]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[79]  Ceslovas Venclovas,et al.  Assessment of progress over the CASP experiments , 2003, Proteins.

[80]  A. Craig,et al.  The role of ICAM-1 in Plasmodium falciparum cytoadherence. , 2005, European journal of cell biology.

[81]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[82]  M. Murcko,et al.  Crystal Structure of the Hepatitis C Virus NS3 Protease Domain Complexed with a Synthetic NS4A Cofactor Peptide , 1996, Cell.

[83]  Arne Elofsson,et al.  Pcons5: combining consensus, structural evaluation and fold recognition scores , 2005, Bioinform..

[84]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[85]  Alfonso Valencia,et al.  Assessment of predictions submitted for the CASP6 comparative modeling category , 2005, Proteins.

[86]  Silvio C. E. Tosatto,et al.  Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust , 2009, Proteins.

[87]  Charles M. Rice,et al.  Unravelling hepatitis C virus replication from genome to function , 2005, Nature.

[88]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[89]  Joseph D. Smith,et al.  A family affair: var genes, PfEMP1 binding, and malaria disease. , 2006, Current opinion in microbiology.

[90]  Jianlin Cheng,et al.  Prediction of global and local quality of CASP8 models by MULTICOM series , 2009, Proteins.

[91]  A Tramontano,et al.  Molecular model of the specificity pocket of the hepatitis C virus protease: implications for substrate recognition. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[92]  Anna Tramontano,et al.  Evaluation of CASP8 model quality predictions , 2009, Proteins.