Protein Structure Prediction by Protein Threading

The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on “the inverse protein folding problem” laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term “protein threading.” These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

[1]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[2]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[3]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[4]  R. Schulz,et al.  Protein Structure Prediction , 2020, Methods in Molecular Biology.

[5]  N S Wingreen,et al.  Are protein folds atypical? , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[7]  Robert M. May,et al.  How Many Species Are There on Earth? , 1988, Science.

[8]  Hans L. Bodlaender A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC '93.

[9]  Igor F. Tsigelny Protein Structure Prediction: Bioinformatic Approach , 2002 .

[10]  F. Melo,et al.  Novel knowledge-based mean force potential at atomic level. , 1997, Journal of molecular biology.

[11]  Ying Xu,et al.  Protein structure prediction using sparse dipolar coupling data. , 2004, Nucleic acids research.

[12]  D Fischer,et al.  Assigning amino acid sequences to 3‐dimensional protein folds , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[13]  C. Murray,et al.  Protein fold recognition by threading: comparison of algorithms and analysis of results. , 1995, Protein engineering.

[14]  Ming Li,et al.  Assessment of RAPTOR's linear programming approach in CAFASP3 , 2003, Proteins.

[15]  C. Laymon A. study , 2018, Predication and Ontology.

[16]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[17]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[18]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[19]  C DeLisi,et al.  Estimating the number of protein folds. , 1998, Journal of molecular biology.

[20]  Ying Xu,et al.  Protein domain decomposition using a graph-theoretic approach , 2000, Bioinform..

[21]  E. Koonin,et al.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. , 2000, Nucleic acids research.

[22]  B. Honig,et al.  A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.

[23]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[24]  Chris Bailey-Kellogg,et al.  Probabilistic cross‐link analysis and experiment planning for high‐throughput elucidation of protein structure , 2004, Protein science : a publication of the Protein Society.

[25]  Edward C. Uberbacher,et al.  Sequence-structure specificity of a knowledge based energy function at the secondary structure level , 2000, Bioinform..

[26]  Michael Q. Zhang,et al.  Current Topics in Computational Molecular Biology , 2002 .

[27]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[28]  J. Prestegard,et al.  New techniques in structural NMR — anisotropic interactions , 1998, Nature Structural Biology.

[29]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[30]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[31]  Jiye Shi,et al.  HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families , 2001, Bioinform..

[32]  B. Bosch,et al.  Cleavage Inhibition of the Murine Coronavirus Spike Protein by a Furin-Like Enzyme Affects Cell-Cell but Not Virus-Cell Fusion , 2004, Journal of Virology.

[33]  Pierre-Yves Calland On the structural complexity of a protein. , 2003, Protein engineering.

[34]  J. Skolnick,et al.  Erratum: Scoring function for automated assessment of protein structure template quality (Proteins: Structure, Function and Genetics (2004) 57, (702-710)) , 2007 .

[35]  P. Munson,et al.  Statistical significance of hierarchical multi‐body potentials based on Delaunay tessellation and their application in sequence‐structure alignment , 1997, Protein science : a publication of the Protein Society.

[36]  Hao Li,et al.  Designability of protein structures: A lattice‐model study using the Miyazawa‐Jernigan matrix , 2002, Proteins.

[37]  M Gerstein,et al.  A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. , 1997, Journal of molecular biology.

[38]  A Kolinski,et al.  The protein folding problem: a biophysical enigma. , 2002, Current pharmaceutical biotechnology.

[39]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[40]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[41]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[42]  Patricia C Babbitt,et al.  Can sequence determine function? , 2000, Genome Biology.

[43]  Thomas Lengauer,et al.  Confidence measures for protein fold recognition , 2002, Bioinform..

[44]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[45]  D. Wetlaufer Nucleation, rapid folding, and globular intrachain regions in proteins. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[46]  A Elofsson,et al.  Assessing the performance of fold recognition methods by means of a comprehensive benchmark. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[47]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[48]  Ying Xu,et al.  Protein structure determination using protein threading and sparse NMR data (extended abstract) , 1999, RECOMB '00.

[49]  C. Chothia,et al.  Understanding protein structure: using scop for fold interpretation. , 1996, Methods in enzymology.

[50]  Liam J. McGuffin,et al.  Improvement of the GenTHREADER Method for Genomic Fold Recognition , 2003, Bioinform..

[51]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[52]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[53]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[54]  Stefan Arnborg,et al.  Linear time algorithms for NP-hard problems restricted to partial k-trees , 1989, Discret. Appl. Math..

[55]  M. Gerstein How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. , 1998, Folding & design.

[56]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[57]  Ilya N. Shindyalov,et al.  PDP: protein domain parser , 2003, Bioinform..

[58]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[59]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[60]  William W. Chen,et al.  Fold recognition with minimal gaps , 2003, Proteins.

[61]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[62]  Dong Xu,et al.  Characterization of protein structure and function at genome scale with a computational prediction pipeline. , 2003, Genetic engineering.

[63]  A. Godzik,et al.  Similarities and differences between nonhomologous proteins with similar folds: evaluation of threading strategies. , 1997, Folding & design.

[64]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[65]  Michael Y. Galperin,et al.  Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. , 1999, Genome research.

[66]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[67]  Jane K. Setlow,et al.  Genetic Engineering: Principles and Methods , 1979, Genetic Engineering: Principles and Methods.

[68]  M. Gerstein,et al.  Comparing genomes in terms of protein structure: surveys of a finite parts list. , 1998, FEMS microbiology reviews.

[69]  N. Wingreen,et al.  Emergence of Preferred Structures in a Simple Model of Protein Folding , 1996, Science.

[70]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[71]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[72]  J. Sorenson,et al.  Redesigning the hydrophobic core of a model β‐sheet protein: Destabilizing traps through a threading approach , 1999, Proteins.

[73]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[74]  B. A. Reed,et al.  Algorithmic Aspects of Tree Width , 2003 .

[75]  Bo Yan,et al.  A graph-theoretic approach for the separation of b and y ions in tandem mass spectra , 2005, Bioinform..

[76]  Greg N. Frederickson,et al.  Planar graph decomposition and all pairs shortest paths , 1991, JACM.

[77]  Malin M. Young,et al.  High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry , 2000, Proc. Natl. Acad. Sci. USA.

[78]  Terry Gaasterland,et al.  Structural genomics: Bioinformatics in the driver's seat , 1998, Nature Biotechnology.

[79]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[80]  Ying Xu,et al.  Protein Threading by Linear Programming , 2003, Pacific Symposium on Biocomputing.

[81]  Protein Anatomy,et al.  The Anatomy and Taxonomy of Protein Structure , 2007 .

[82]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[83]  M. Sternberg,et al.  Flexible protein sequence patterns. A sensitive method to detect weak structural similarities. , 1990, Journal of molecular biology.

[84]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[85]  Helen M Berman,et al.  Large macromolecular complexes in the Protein Data Bank: a status report. , 2005, Structure.

[86]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[87]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[88]  Z. X. Wang,et al.  How many fold types of protein are there in nature? , 1996, Proteins.

[89]  Ying Xu,et al.  A polynomial-time algorithm for a class of protein threading problems , 1996, Comput. Appl. Biosci..

[90]  M J Sternberg,et al.  On the use of chemically derived distance constraints in the prediction of protein structure with myoglobin as an example. , 1980, Journal of molecular biology.

[91]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[92]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[93]  W R Taylor,et al.  A local alignment method for protein structure motifs. , 1993, Journal of molecular biology.

[94]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[95]  Janet M. Thornton,et al.  Protein fold recognition , 1993, J. Comput. Aided Mol. Des..

[96]  Bonnie Berger,et al.  A tree-decomposition approach to protein structure prediction , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[97]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[98]  Dong Xu,et al.  A PRACTICAL METHOD FOR INTERPRETATION OF THREADING SCORES: AN APPLICATION OF NEURAL NETWORK , 2002 .

[99]  Thomas Madej,et al.  Threading analysis suggests that the obese gene product may be a helical cytokine , 1995, FEBS letters.

[100]  Scott M. Le Grand,et al.  A study of combined structure/sequence profiles. , 1996, Folding & design.

[101]  Adam Godzik,et al.  Fold recognition methods. , 2005, Methods of biochemical analysis.

[102]  E Uberbacher,et al.  Protein threading by PROSPECT: a prediction experiment in CASP3. , 1999, Protein engineering.

[103]  Gilles Brassard,et al.  Fundamentals of Algorithmics , 1995 .

[104]  Ceslovas Venclovas,et al.  Assessment of progress over the CASP experiments , 2003, Proteins.

[105]  Jie Liang,et al.  Geometric cooperativity and anticooperativity of three‐body interactions in native proteins , 2005, Proteins.

[106]  J Skolnick,et al.  Defrosting the frozen approximation: PROSPECTOR— A new approach to threading , 2001, Proteins.

[107]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[108]  O. Lund,et al.  Protein distance constraints predicted by neural networks and probability density functions. , 1997, Protein engineering.

[109]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[110]  Arthur M. Lesk,et al.  Introduction to protein architecture : the structural biologyof proteins , 2001 .

[111]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[112]  Leszek Rychlewski,et al.  mRNA Cap-1 Methyltransferase in the SARS Genome , 2003, Cell.

[113]  Liam J. McGuffin,et al.  The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms , 2004, Nucleic Acids Res..

[114]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[115]  Lisa N Kinch,et al.  CASP5 assessment of fold recognition target predictions , 2003, Proteins.

[116]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[117]  A. Elofsson,et al.  Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. , 2005, Journal of molecular biology.

[118]  Ying Xu,et al.  PROSPECT-PSPP: an automatic computational pipeline for protein structure prediction , 2004, Nucleic Acids Res..

[119]  J H Prestegard,et al.  Nuclear magnetic dipole interactions in field-oriented proteins: information for structure determination in solution. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[120]  Dong Xu,et al.  PROSPECT II: protein structure prediction program for genome-scale applications. , 2003, Protein engineering.

[121]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[122]  C. Branden,et al.  Introduction to protein structure , 1991 .

[123]  Sarah A. Teichmann,et al.  An insight into domain combinations , 2001, ISMB.

[124]  E. Shakhnovich,et al.  SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence , 1996 .

[125]  Dong Xu,et al.  Application of computational biology in understanding emerging infectious diseases : inferring biological function for SM complex of SARS-CoV , 2004 .

[126]  Paul D. Seymour,et al.  Graph Minors: XV. Giant Steps , 1996, J. Comb. Theory, Ser. B.

[127]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[128]  Ying Xu,et al.  Protein Fold Recognition Through Application of Residual Dipolar Coupling Data , 2004, Pacific Symposium on Biocomputing.

[129]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[130]  D Xu,et al.  Model for the three‐dimensional structure of vitronectin: Predictions for the multi‐domain protein from threading and docking , 2001, Proteins.

[131]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[132]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[133]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[134]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[135]  M J Sippl,et al.  Assessment of the CASP4 fold recognition category , 2001, Proteins.

[136]  Ying Xu,et al.  A Computational Method for NMR-Constrained Protein Threading , 2000, J. Comput. Biol..

[137]  L A Mirny,et al.  Statistical significance of protein structure prediction by threading. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[138]  G M Clore,et al.  Exploring the limits of precision and accuracy of protein structures determined by nuclear magnetic resonance spectroscopy. , 1993, Journal of molecular biology.

[139]  Hui Lu,et al.  Multimeric threading-based prediction of protein-protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. , 2003, Genome research.

[140]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[141]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..