A historical perspective of template-based protein structure prediction.

This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.

[1]  M J Sternberg,et al.  Enhancement of protein modeling by human intervention in applying the automatic programs 3D‐JIGSAW and 3D‐PSSM , 2001, Proteins.

[2]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[3]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[4]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[5]  M. Levitt,et al.  Computer simulation of protein folding , 1975, Nature.

[6]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[7]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[8]  J Bajorath,et al.  Identification of residues on CD40 and its ligand which are critical for the receptor-ligand interaction. , 1995, Biochemistry.

[9]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[10]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[11]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[12]  Woei-Jyh Lee,et al.  Evaluation of domain prediction in CASP6 , 2005, Proteins.

[13]  D. Baker,et al.  Protein structure prediction in 2002. , 2002, Current opinion in structural biology.

[14]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[15]  Ceslovas Venclovas,et al.  Progress over the first decade of CASP experiments , 2005, Proteins.

[16]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[17]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[18]  Y. Matsuo,et al.  Development of pseudoenergy potentials for assessing protein 3-D-1-D compatibility and detecting weak homologies. , 1993, Protein engineering.

[19]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[20]  M J Sippl,et al.  Assessment of the CASP4 fold recognition category , 2001, Proteins.

[21]  Zhexin Xiang,et al.  Homology-Based Modeling of Protein Structure , 2007 .

[22]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[24]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[25]  J F Riordan,et al.  A preliminary three-dimensional structure of angiogenin. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[26]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[27]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[28]  Janet M. Thornton,et al.  Protein fold recognition , 1993, J. Comput. Aided Mol. Des..

[29]  C. Levinthal,et al.  Predicting antibody hypervariable loop conformations II: Minimization and molecular dynamics studies of MCPC603 from many randomly generated loop conformations , 1986, Proteins.

[30]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[31]  D T Jones,et al.  Prediction of novel and analogous folds using fragment assembly and fold recognition , 2005, Proteins.

[32]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[33]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[34]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[35]  P Argos,et al.  An assessment of protein secondary structure prediction methods based on amino acid sequence. , 1976, Biochimica et biophysica acta.

[36]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Jinbo Xu,et al.  Rapid Protein Side-Chain Packing via Tree Decomposition , 2005, RECOMB.

[38]  A. D. McLachlan,et al.  Solvation energy in protein folding and binding , 1986, Nature.

[39]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[40]  Lisa N Kinch,et al.  CASP5 assessment of fold recognition target predictions , 2003, Proteins.

[41]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.

[42]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[43]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[44]  Steven E Brenner,et al.  The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[45]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[46]  David C. Jones,et al.  Potential energy functions for threading. , 1996, Current opinion in structural biology.

[47]  K. Fidelis,et al.  Comparison of systematic search and database methods for constructing segments of protein structure. , 1994, Protein engineering.

[48]  D T Jones,et al.  Protein fold recognition by sequence threading: tools and assessment techniques , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[49]  Daniel Fischer,et al.  ‘Meta’Approaches to Protein Structure Prediction , 2008 .

[50]  J. Jung,et al.  Protein structure prediction. , 2001, Current opinion in chemical biology.

[51]  Shoji Takada,et al.  A Reversible Fragment Assembly Method for De Novo Protein Structure Prediction , 2003 .

[52]  B Contreras-Moreira,et al.  Empirical limits for template‐based protein structure prediction: the CASP5 example , 2005, FEBS letters.

[53]  I Lasters,et al.  Enhanced dead-end elimination in the search for the global minimum energy conformation of a collection of protein side chains. , 1995, Protein engineering.

[54]  Julian Lee,et al.  PROTEINS: Structure, Function, and Bioinformatics 56:704–714 (2004) Prediction of Protein Tertiary Structure Using PROFESY, a Novel Method Based on Fragment Assembly and , 2022 .

[55]  Liam J. McGuffin,et al.  Improvement of the GenTHREADER Method for Genomic Fold Recognition , 2003, Bioinform..

[56]  Akbar Nayeem,et al.  A comparative study of available software for high‐accuracy homology modeling: From sequence alignments to structural models , 2006, Protein science : a publication of the Protein Society.

[57]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[58]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[59]  J. Wójcik,et al.  New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification. , 1999, Journal of molecular biology.

[60]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[61]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[62]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[63]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[64]  Thomas Lengauer,et al.  Confidence measures for protein fold recognition , 2002, Bioinform..

[65]  B. Honig,et al.  A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.

[66]  Janusz M Bujnicki,et al.  Protein‐Structure Prediction by Recombination of Fragments , 2006, Chembiochem : a European journal of chemical biology.

[67]  Barry Honig,et al.  Extending the accuracy limits of prediction for side-chain conformations. , 2001 .

[68]  Roland L Dunbrack,et al.  Assessment of fold recognition predictions in CASP6 , 2005, Proteins.

[69]  Dong Xu,et al.  PROSPECT II: protein structure prediction program for genome-scale applications. , 2003, Protein engineering.

[70]  J. Thompson,et al.  2 A crystal structure of an extracellular fragment of human CD40 ligand. , 1995, Structure.

[71]  D. Shortle,et al.  Structure prediction: The state of the art , 1999, Current Biology.

[72]  T. Blundell,et al.  Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. , 1987, Protein engineering.

[73]  C. Murray,et al.  Protein fold recognition by threading: comparison of algorithms and analysis of results. , 1995, Protein engineering.

[74]  J Bajorath,et al.  Detailed Comparison of Two Molecular Models of the Human CD40 Ligand with an X-ray Structure and Critical Assessment of Model-based Mutagenesis and Residue Mapping Studies* , 1998, The Journal of Biological Chemistry.

[75]  W. C. Ripka,et al.  Computer-assisted model building , 1986, Nature.

[76]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[77]  G. N. Ramachandran,et al.  Studies on the conformation of amino acids. XI. Analysis of the observed side group conformation in proteins. , 2009, International journal of protein research.

[78]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[79]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[80]  N. Isaacs,et al.  Relaxin and its structural relationship to insulin , 1978, Nature.

[81]  Yaoqi Zhou,et al.  SPARKS 2 and SP3 servers in CASP6 , 2005, Proteins.

[82]  Ying Xu,et al.  PROSPECT-PSPP: an automatic computational pipeline for protein structure prediction , 2004, Nucleic Acids Res..

[83]  S. Wodak,et al.  Modelling the polypeptide backbone with 'spare parts' from known protein structures. , 1989, Protein engineering.

[84]  Anna Tramontano,et al.  Ten years of predictions … and counting , 2005, The FEBS journal.

[85]  D Xu,et al.  Application of PROSPECT in CASP4: Characterizing protein structures with new folds , 2001, Proteins.

[86]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[87]  Bonnie Berger,et al.  A tree-decomposition approach to protein structure prediction , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[88]  C DeLisi,et al.  Estimating the number of protein folds. , 1998, Journal of molecular biology.

[89]  Arne Elofsson,et al.  Profile–profile methods provide improved fold‐recognition: A study of different profile–profile alignment methods , 2004, Proteins.

[90]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[91]  J. Skolnick,et al.  Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement , 2001, Proteins.

[92]  C. Zhang,et al.  Relations of the numbers of protein sequences, families and folds. , 1997, Protein engineering.

[93]  S. Wodak,et al.  Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[94]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[95]  J. Greer,et al.  Model structure for the inflammatory protein C5a. , 1985, Science.

[96]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[97]  W. Pearson Empirical statistical estimates for sequence similarity searches. , 1998, Journal of molecular biology.

[98]  J. Straub,et al.  Orientational potentials extracted from protein structures improve native fold recognition , 2004, Protein science : a publication of the Protein Society.

[99]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[100]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[101]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[102]  Shoji Takada,et al.  SimFold energy function for de novo protein structure prediction: Consensus with Rosetta , 2005, Proteins.

[103]  J. Moult,et al.  An algorithm for determining the conformation of polypeptide segments in proteins by systematic search , 1986, Proteins.

[104]  A. Godzik,et al.  Sequence-structure matching in globular proteins: application to supersecondary and tertiary structure determination. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[105]  E. Shakhnovich,et al.  SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence , 1996 .

[106]  Celia W G van Gelder,et al.  A molecular dynamics approach for the generation of complete protein structures from limited coordinate data , 1994, Proteins.

[107]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[108]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[109]  Adrian A Canutescu,et al.  A graph‐theory algorithm for rapid protein side‐chain prediction , 2003, Protein science : a publication of the Protein Society.

[110]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[111]  K Seyama,et al.  Classification of mutations in the human CD40 ligand, gp39, that are associated with X‐linked hyper IgM syndrome , 1996, Protein science : a publication of the Protein Society.

[112]  Hongyi Zhou,et al.  The dependence of all-atom statistical potentials on structural training database. , 2004, Biophysical journal.

[113]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[114]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[115]  M J Sippl,et al.  Progress in fold recognition , 1995, Proteins.

[116]  Liam J McGuffin,et al.  Assembling novel protein folds from super‐secondary structural fragments , 2003, Proteins.

[117]  Robert L Jernigan,et al.  How effective for fold recognition is a potential of mean force that includes relative orientations between contacting residues in proteins? , 2005, The Journal of chemical physics.

[118]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[119]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[120]  D Fischer,et al.  Assigning amino acid sequences to 3‐dimensional protein folds , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[121]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[122]  D Fischer,et al.  CAFASP‐1: Critical assessment of fully automated structure prediction methods , 1999, Proteins.

[123]  Kentaro Shimizu,et al.  Development of an ab initio protein structure prediction system ABLE. , 2003, Genome informatics. International Conference on Genome Informatics.

[124]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[125]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[126]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[127]  T L Blundell,et al.  Knowledge based modelling of homologous proteins, Part II: Rules for the conformations of substituted sidechains. , 1987, Protein engineering.

[128]  David C. Jones,et al.  Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[129]  D C Richardson,et al.  Asparagine and glutamine rotamers: B-factor cutoff and correction of amide flips yield distinct clustering. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[130]  Roland L Dunbrack,et al.  Scoring profile‐to‐profile sequence alignments , 2004, Protein science : a publication of the Protein Society.

[131]  Dong Xu,et al.  Improving the performance of DomainParser for structural domain partition using neural network. , 2003, Nucleic acids research.

[132]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[133]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[134]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[135]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[136]  Gary D. Stormo,et al.  Phylogenetically enhanced statistical tools for RNA structure prediction , 2000, Bioinform..

[137]  M Levitt,et al.  The predicted structure of immunoglobulin D1.3 and its comparison with the crystal structure , 1986, Science.

[138]  J Bajorath,et al.  Analysis of gp39/CD40 interactions using molecular models and site-directed mutagenesis. , 1995, Biochemistry.

[139]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[140]  R A Goldstein,et al.  Why are some proteins structures so common? , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[141]  B. L. Sibanda,et al.  Three-dimensional structure, specificity and catalytic mechanism of renin , 1983, Nature.

[142]  C. D. Barry,et al.  Comparison of predicted and experimentally determined secondary structure of adenyl kinase , 1974, Nature.

[143]  S. L. Mayo,et al.  Conformational splitting: A more powerful criterion for dead‐end elimination , 2000, J. Comput. Chem..

[144]  M J Sippl,et al.  Threading thrills and threats. , 1996, Structure.

[145]  C. Sander,et al.  Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace application to model building and detection of co-ordinate errors. , 1991, Journal of molecular biology.

[146]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[147]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[148]  J. Greer,et al.  Model for haptoglobin heavy chain based upon structural homology. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[149]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[150]  M C Peitsch,et al.  A 3-D model for the CD40 ligand predicts that it is a compact trimer similar to the tumor necrosis factors. , 1993, International immunology.

[151]  G Kolata Trying to crack the second half of the genetic code. , 1986, Science.

[152]  J. Greer Comparative model-building of the mammalian serine proteases. , 1981, Journal of molecular biology.

[153]  B Honig,et al.  Sequence to structure alignment in comparative modeling using PrISM , 1999, Proteins.

[154]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[155]  David T. Jones Successful ab initio prediction of the tertiary structure of NK‐lysin using multiple sequences and recognized supersecondary structural motifs , 1997, Proteins.

[156]  I Lasters,et al.  Theoretical and algorithmical optimization of the dead-end elimination theorem. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[157]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[158]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[159]  I. Lasters,et al.  The fuzzy-end elimination theorem: correctly implementing the side chain placement algorithm based on the dead-end elimination theorem. , 1993, Protein engineering.

[160]  T L Blundell,et al.  Insulin-like growth factor: a model for tertiary structure accounting for immunoreactivity and receptor binding. , 1978, Proceedings of the National Academy of Sciences of the United States of America.

[161]  Arne Elofsson,et al.  All are not equal: A benchmark of different homology modeling programs , 2005, Protein science : a publication of the Protein Society.

[162]  D. Phillips,et al.  A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen's egg-white lysozyme. , 1969, Journal of molecular biology.

[163]  M. Karplus,et al.  PDB-based protein loop prediction: parameters for selection and methods for optimization. , 1997, Journal of molecular biology.

[164]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[165]  Lei Xie,et al.  Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling , 2003, Proteins.

[166]  Z. X. Wang,et al.  A re-estimation for the total numbers of protein folds and superfamilies. , 1998, Protein engineering.

[167]  Ming Li,et al.  Assessment of RAPTOR's linear programming approach in CAFASP3 , 2003, Proteins.

[168]  Cinque S. Soto,et al.  Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[169]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[170]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[171]  Z. X. Wang,et al.  How many fold types of protein are there in nature? , 1996, Proteins.

[172]  Jinbo Xu Fold recognition by predicted alignment accuracy , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.