Protein structure prediction begins well but ends badly

The accurate prediction of protein structure, both secondary and tertiary, is an ongoing problem. Over the years, many approaches have been implemented and assessed. Most prediction algorithms start with the entire amino acid sequence and treat all residues in an identical fashion independent of sequence position. Here, we analyze blind prediction data to investigate whether predictive capability varies along the chain. Free modeling results from recent critical assessment of techniques for protein structure prediction (CASP) experiments are evaluated; as is the most up‐to‐date data from EVA, a fully automated blind test of secondary structure prediction servers. The results demonstrate that structure prediction accuracy is dependent on sequence position. Both secondary structure and tertiary structure predictions are more accurate in regions near the amino(N)‐terminus when compared with analogous regions near the carboxy(C)‐terminus. Eight of 10 secondary structure prediction algorithms assessed by EVA perform significantly better in regions at the N‐terminus. CASP data shows a similar bias, with N‐terminal fragments being predicted more accurately than fragments from the C‐terminus. Two analogous fragments are taken from each model, the N‐terminal fragment begins at the start of the most N‐terminal secondary structure element (SSE), whereas the C‐terminal fragment finishes at the end of the most C‐terminal SSE. Each fragment is locally superimposed onto its respective native fragment. The relative terminal prediction accuracy (RMSD) is calculated on an intramodel basis. At a fragment length of 20 residues, the N‐terminal fragment is predicted with greater accuracy in 79% of cases. Proteins 2010. © 2009 Wiley‐Liss, Inc.

[1]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[2]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[3]  R. Glockshuber,et al.  Fast folding of the two-domain semliki forest virus capsid protein explains co-translational proteolytic activity. , 2004, Journal of molecular biology.

[4]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[5]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[6]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[7]  Yawen Bai,et al.  15N NMR spin relaxation dispersion study of the molecular crowding effects on protein folding under native conditions. , 2006, Journal of the American Chemical Society.

[8]  E. V. Makeyev,et al.  Co-translational Folding of an Eukaryotic Multidomain Protein in a Prokaryotic Translation System* , 2000, The Journal of Biological Chemistry.

[9]  D. Phillips,et al.  On the conformation of the hen egg-white lysozyme molecule , 1967, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[10]  David S. Wishart,et al.  Improving the accuracy of protein secondary structure prediction using structural alignment , 2006, BMC Bioinformatics.

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005 .

[13]  P Argos,et al.  An assessment of protein secondary structure prediction methods based on amino acid sequence. , 1976, Biochimica et biophysica acta.

[14]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[15]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[16]  Wei Chen,et al.  Co-translational folding of an alphavirus capsid protein in the cytosol of living cells , 1999, Nature Cell Biology.

[17]  Marc A. Martí-Renom,et al.  EVA: evaluation of protein structure prediction servers , 2003, Nucleic Acids Res..

[18]  A. Laio,et al.  Are structural biases at protein termini a signature of vectorial folding? , 2005, Proteins.

[19]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[20]  Krzysztof Fidelis,et al.  Progress from CASP 6 to CASP 7 , 2007 .

[21]  R. Srinivasan,et al.  A physical basis for protein secondary structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[23]  David Baker,et al.  Macromolecular modeling with rosetta. , 2008, Annual review of biochemistry.

[24]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[25]  Kevin Karplus,et al.  Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set , 2001, Bioinform..

[26]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[27]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[28]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[29]  R. Srinivasan,et al.  LINUS: A hierarchic procedure to predict the fold of a protein , 1995, Proteins.

[30]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[31]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[32]  András Fiser,et al.  Saturating representation of loop conformational fragments in structure databanks , 2006, BMC Structural Biology.

[33]  A. Guzzo,et al.  The influence of amino-acid sequence on protein structure. , 1965, Biophysical journal.

[34]  Volker A. Eyrich,et al.  EVA: Large‐scale analysis of secondary structure prediction , 2001, Proteins.

[35]  Krzysztof Fidelis,et al.  Processing and evaluation of predictions in CASP4 , 2001, Proteins.

[36]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[37]  C. D. Barry,et al.  Comparison of predicted and experimentally determined secondary structure of adenyl kinase , 1974, Nature.

[38]  R. Ellis Macromolecular crowding : obvious but underappreciated , 2022 .

[39]  Robert B. Best,et al.  Thermodynamics and kinetics of protein folding under confinement , 2008, Proceedings of the National Academy of Sciences.

[40]  David E. Kim,et al.  Free modeling with Rosetta in CASP6 , 2005, Proteins.

[41]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[42]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[43]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[44]  Krzysztof Fidelis,et al.  Progress from CASP6 to CASP7 , 2007, Proteins.

[45]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[46]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[47]  Charlotte M. Deane,et al.  JOY: protein sequence-structure representation and analysis , 1998, Bioinform..

[48]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[49]  D. Phillips,et al.  THE HEN EGG-WHITE LYSOZYME MOLECULE , 1967 .

[50]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[51]  A. Komar,et al.  A pause for thought along the co-translational folding pathway. , 2009, Trends in biochemical sciences.

[52]  A. Minton Implications of macromolecular crowding for protein assembly. , 2000, Current opinion in structural biology.