An overview of protein-folding techniques: issues and perspectives

The importance of protein folding has been recognised for many years. Almost a half century ago, Linus Pauling discovered two quite simple, regular arrangements of amino acids--the alpha-helix and the beta-sheet that are found in almost every protein. In the early 1960s, Christian Anfinsen showed that the proteins actually "tie" themselves: If proteins become unfolded, they fold back into proper shape of their own accord; no shaper or folder is needed. The nature of the unfolded state plays a great role in understanding proteins. Alzheimer's disease, cystic fibrosis, mad cow disease, and many cancers are inherited emphysema. Recent discoveries show that all these apparently unrelated diseases result from protein folding gone wrong. Theoretical and computational studies have recently achieved noticeable success in reproducing various features of the folding mechanism of several small to medium-sized fast-folding proteins. This survey presents the state-of-the-art in protein structure prediction methods from a computer scientist perspective.

[1]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[2]  T. Dandekar,et al.  Improving genetic algorithms for protein folding simulations by systematic crossover. , 1999, Bio Systems.

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  Dennis Shasha,et al.  New techniques for extracting features from protein sequences , 2001, IBM Syst. J..

[5]  K. Dill,et al.  A fast conformational search strategy for finding low energy structures of model proteins , 1996, Protein science : a publication of the Protein Society.

[6]  N Linial,et al.  ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space , 1999, Proteins.

[7]  John Moult,et al.  Molecular modeling of protein function regions , 2004, Proteins.

[8]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[9]  Jeffrey W. Roberts,et al.  遺伝子の分子生物学 = Molecular biology of the gene , 1970 .

[10]  C. Chothia,et al.  The geometry of domain combination in proteins. , 2002, Journal of molecular biology.

[11]  Jesús A. Izaguirre,et al.  Petaflop Computing for Protein Folding , 2001, PPSC.

[12]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[13]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[14]  J. Onuchic,et al.  Topological and energetic factors: what determines the structural details of the transition state ensemble and "en-route" intermediates for protein folding? An investigation for small globular proteins. , 2000, Journal of molecular biology.

[15]  H. Roder,et al.  Stepwise helix formation and chain compaction during protein folding , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Michael R. Shirts,et al.  Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing. , 2003, Biopolymers.

[17]  Hue Sun Chan,et al.  Simple two‐state protein folding kinetics requires near‐levinthal thermodynamic cooperativity , 2003, Proteins.

[18]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[19]  José N Onuchic,et al.  Gatekeepers in the ribosomal protein s6: thermodynamics, kinetics, and folding pathways revealed by a minimalist protein model. , 2004, Journal of molecular biology.

[20]  Hae-Jin Hu,et al.  Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier , 2004, IEEE Transactions on NanoBioscience.

[21]  D Thirumalai,et al.  Stiffness of the distal loop restricts the structural heterogeneity of the transition state ensemble in SH3 domains. , 2002, Journal of molecular biology.

[22]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[23]  S. Teichmann,et al.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination , 2004, Journal of Structural and Functional Genomics.

[24]  H. Judson The Eighth Day of Creation: Makers of the Revolution in Biology , 2013 .

[25]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[26]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[27]  Y. Okamoto Protein Folding Problem as Studied by New Simulation Algorithms , 1998 .

[28]  William E. Hart,et al.  Lattice and off-lattice side chain models of protein folding (extended abstract): linear time structure prediction better than 86% of optimal , 1997, RECOMB '97.

[29]  F. Sanger,et al.  The arrangement of amino acids in proteins. , 1952, Advances in protein chemistry.

[30]  Martin Karplus,et al.  Application of the diffusion-collision model to the folding of three-helix bundle proteins. , 2002, Journal of molecular biology.

[31]  William E. Hart,et al.  Protein structure prediction with evolutionary algorithms , 1999 .

[32]  Michael R. Shirts,et al.  Simulation of folding of a small alpha-helical protein in atomistic detail using worldwide-distributed computing. , 2002, Journal of molecular biology.

[33]  Hesham H. Ali,et al.  A genetic algorithm for simplifying the amino acid alphabet , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[34]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[35]  William R Taylor,et al.  A structural pattern‐based method for protein fold recognition , 2004, Proteins.

[36]  S W Englander,et al.  Protein folding intermediates and pathways studied by hydrogen exchange. , 2000, Annual review of biophysics and biomolecular structure.

[37]  V. Daggett Molecular dynamics simulations of the protein unfolding/folding reaction. , 2002, Accounts of chemical research.

[38]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[39]  Walter Nadler,et al.  Testing a new Monte Carlo algorithm for protein folding , 1998 .

[40]  David Baker,et al.  Simple physical models connect theory and experiment in protein folding kinetics. , 2002, Journal of molecular biology.

[41]  Charlotte M. Deane,et al.  JOY: protein sequence-structure representation and analysis , 1998, Bioinform..

[42]  T. Hubbard,et al.  Fold recognition and ab initio structure predictions using hidden markov models and β‐strand pair potentials , 1995, Proteins.

[43]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[44]  William E. Hart,et al.  Fast Protein Folding in the Hydrophobic-Hydrophillic Model within Three-Eights of Optimal , 1996, J. Comput. Biol..

[45]  K. Dill,et al.  A statistical mechanical model for hydrogen exchange in globular proteins , 1995, Protein science : a publication of the Protein Society.

[46]  Daniel Fischer,et al.  Convergent evolution of protein structure prediction and computer chess tournaments: CASP, Kasparov, and CAFASP , 2001, IBM Syst. J..

[47]  S Banu Ozkan,et al.  Computing the transition state populations in simple protein models. , 2003, Biopolymers.

[48]  Eric J. Sorin,et al.  Simulations of the role of water in the protein-folding mechanism. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[50]  K. Dill Polymer principles and protein folding , 1999, Protein science : a publication of the Protein Society.

[51]  Richard Bonneau,et al.  Ab initio protein structure prediction: progress and prospects. , 2001, Annual review of biophysics and biomolecular structure.

[52]  G. Vonheijne MEMBRANE PROTEINS : FROM SEQUENCE TO STRUCTURE , 1994 .

[53]  Tatsuya Akutsu,et al.  Protein Threading Using a Score Function Derived by a Linear Programming Based Method , 1997 .

[54]  Amos Bairoch,et al.  Swiss-Prot: Juggling between evolution and stability , 2004, Briefings Bioinform..

[55]  Caleb Webber,et al.  Estimation of P-values for global alignments of protein sequences , 2001, Bioinform..

[56]  W. Wong,et al.  Evolutionary Monte Carlo for protein folding simulations , 2001 .

[57]  Cecilia Clementi,et al.  The effects of nonnative interactions on protein folding rates: Theory and simulation , 2004, Protein science : a publication of the Protein Society.

[58]  Ken A Dill,et al.  Folding kinetics of two-state proteins: effect of circularization, permutation, and crosslinks. , 2003, Journal of molecular biology.

[59]  J. Jung,et al.  Protein structure prediction. , 2001, Current opinion in chemical biology.

[60]  J. Ferkinghoff-Borg,et al.  Mean first-passage time analysis reveals rate-limiting steps, parallel pathways and dead ends in a simple model of protein folding , 2003 .

[61]  W. Goddard,et al.  First principles prediction of protein folding rates. , 1999, Journal of molecular biology.

[62]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[63]  George Chikenji,et al.  MULTI-SELF-OVERLAP ENSEMBLE FOR PROTEIN FOLDING : GROUND STATE SEARCH AND THERMODYNAMICS , 1999, cond-mat/9903003.

[64]  Benjamin A. Shoemaker,et al.  Exploring structures in protein folding funnels with free energy functionals: the denatured ensemble. , 1999, Journal of molecular biology.

[65]  K. Dill,et al.  Transition states and folding dynamics of proteins and heteropolymers , 1994 .

[66]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[67]  Gunnar von Heijne,et al.  Computer-Assisted Identification of Protein Sorting Signals and Prediction of Membrane Protein Topology and Structure , 1996 .

[68]  Vincent Lombard,et al.  The EMBL Nucleotide Sequence Database , 2002, Nucleic Acids Res..

[69]  R. Srinivasan,et al.  A physical basis for protein secondary structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[70]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[71]  Leszek Rychlewski,et al.  Detection of reliable and unexpected protein fold predictions using 3D-Jury , 2003, Nucleic Acids Res..

[72]  Roland L. Dunbrack,et al.  CAFASP2: The second critical assessment of fully automated structure prediction methods , 2001, Proteins.

[73]  Noel Southall,et al.  ChemInform Abstract: A View of the Hydrophobic Effect , 2002 .

[74]  R. Unger,et al.  Chaos in protein dynamics , 1997, Proteins.

[75]  Patrick Argos,et al.  Topology prediction of membrane proteins , 1996, Protein science : a publication of the Protein Society.

[76]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[77]  Chaok Seok,et al.  A kinematic view of loop closure , 2004, J. Comput. Chem..

[78]  Jie Liang,et al.  On Design of Optimal Nonlinear Kernel Potential Function for Protein Folding and Protein Design , 2003, cond-mat/0302002.

[79]  Ming-Yang Kao,et al.  A Combinatorial Toolbox for Protein Sequence Design and Landscape Analysis in the Grand Canonical Model , 2001, ISAAC.

[80]  J Rumbley,et al.  An amino acid code for protein folding. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[81]  D Botstein,et al.  Of Genes and Genomes , 1999, Annals of the New York Academy of Sciences.

[82]  Ken A Dill,et al.  Cooperativity in two‐state protein folding kinetics , 2003, Protein science : a publication of the Protein Society.

[83]  R. Srinivasan,et al.  Ab initio prediction of protein structure using LINUS , 2002, Proteins.

[84]  Michael R. Shirts,et al.  Native-like mean structure in the unfolded ensemble of small proteins. , 2002, Journal of molecular biology.

[85]  William E. Hart,et al.  Lattice and Off-Lattice Side Chain Models of Protein Folding: Linear Time Structure Prediction Better than 86% of Optimal , 1997, J. Comput. Biol..

[86]  William E. Hart,et al.  Lattice and Off-Lattice Side Chain Models of Protein Folding: Linear Time Structure Prediction Better Than 86% of Optimal (Extended Abstract) , 1996, RECOMB 1997.

[87]  Yi Wang,et al.  Multiple Sequence Alignment Using Tabu Search , 2004, APBC.

[88]  M J Sternberg,et al.  Enhancement of protein modeling by human intervention in applying the automatic programs 3D‐JIGSAW and 3D‐PSSM , 2001, Proteins.

[89]  Temple F. Smith,et al.  Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[90]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[91]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[92]  William E. Hart,et al.  Fast protein folding in the hydrophobic-hydrophilic model within three-eights of optimal , 1995, STOC '95.

[93]  Y. Cui,et al.  Protein folding simulation with genetic algorithm and supersecondary structure constraints , 1998, Proteins.

[94]  Erich Bornberg-Bauer,et al.  Perspectives on protein evolution from simple exact models. , 2002, Applied bioinformatics.