SSThread: Template‐free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs

Acquiring the three‐dimensional structure of a protein from its amino acid sequence alone, despite a great deal of work and significant progress on the subject, is still an unsolved problem. SSThread, a new template‐free algorithm is described here that consists of making several predictions of contacting pairs of α‐helices and β‐strands derived from a database of experimental structures using a knowledge‐based potential, secondary structure prediction, and contact map prediction followed by assembly of overlapping pair predictions to create an ensemble of core structure predictions whose loops are then predicted. In a set of seven CASP10 targets SSThread outperformed the two leading methods for two targets each. The targets were all β‐strand containing structures and most of them have a high relative contact order which demonstrates the advantages of SSThread. The primary bottlenecks based on sets of 74 and 21 test cases are the pair prediction and loop prediction stages. © 2014 Wiley Periodicals, Inc.

[1]  HuangYing,et al.  CD-HIT Suite , 2010 .

[2]  S. Wodak,et al.  Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. , 1991, Journal of molecular biology.

[3]  Jens Meiler,et al.  BCL::Fold - De Novo Prediction of Complex and Large Protein Topologies by Assembly of Secondary Structure Elements , 2012, PloS one.

[4]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[5]  D. Baker,et al.  Contact order, transition state placement and the refolding rates of single domain proteins. , 1998, Journal of molecular biology.

[6]  K. Dill,et al.  Iterative assembly of helical proteins by optimal hydrophobic packing. , 2008, Structure.

[7]  Richard Bonneau,et al.  Contact order and ab initio protein structure prediction , 2002, Protein science : a publication of the Protein Society.

[8]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[9]  Vincent A. Voelz,et al.  Blind test of physics-based prediction of protein structures. , 2009, Biophysical journal.

[10]  G. Casari,et al.  Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[11]  Oliver Kreylos,et al.  BuildBeta—A system for automatically constructing beta sheets , 2010, Proteins.

[12]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[13]  M Levitt,et al.  A novel method for sampling alpha-helical protein backbones. , 2000, Journal of molecular biology.

[14]  Wouter Boomsma,et al.  Full cyclic coordinate descent: solving the protein loop closure problem in Cα space , 2005, BMC Bioinformatics.

[15]  K. Nagano Logical analysis of the mechanism of protein folding. I. Predictions of helices, loops and beta-structures from primary structure. , 1973, Journal of molecular biology.

[16]  R. Casadio,et al.  A neural network based predictor of residue contacts in proteins. , 1999, Protein engineering.

[17]  R. Othman,et al.  Computational identification of self‐inhibitory peptides from envelope proteins , 2012, Proteins.

[18]  Jianlin Cheng,et al.  NNcon: improved protein contact map prediction using 2D-recursive neural networks , 2009, Nucleic Acids Res..

[19]  Alejandro Panjkovich,et al.  Evolutionary potentials: structure specific knowledge-based potentials exploiting the evolutionary record of sequence homologs , 2008, Genome Biology.

[20]  S. Wodak,et al.  Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[21]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[22]  J. Straub,et al.  Orientational potentials extracted from protein structures improve native fold recognition , 2004, Protein science : a publication of the Protein Society.

[23]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[24]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[25]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[26]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[27]  Xiaolong Wang,et al.  Novel knowledge-based mean force potential at the profile level , 2006, BMC Bioinformatics.

[28]  Yuxing Liao,et al.  CASP9 assessment of free modeling target predictions , 2011, Proteins.

[29]  A. Liwo,et al.  Application of Multiplexed Replica Exchange Molecular Dynamics to the UNRES Force Field: Tests with alpha and alpha+beta Proteins. , 2009, Journal of chemical theory and computation.

[30]  J. Meiler,et al.  BCL::Score—Knowledge Based Energy Potentials for Ranking Protein Models Represented by Idealized Secondary Structure Elements , 2012, PloS one.

[31]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[32]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[33]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[34]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[35]  Robert L Jernigan,et al.  How effective for fold recognition is a potential of mean force that includes relative orientations between contacting residues in proteins? , 2005, The Journal of chemical physics.

[36]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[37]  Alexandre C. B. Delbem,et al.  Multiobjective evolutionary algorithm with many tables for purely ab initio protein structure prediction , 2013, J. Comput. Chem..

[38]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[39]  D. Baker,et al.  Clustering of low-energy conformations near the native structures of small proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[40]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[41]  Lukasz A. Kurgan,et al.  Critical assessment of high-throughput standalone methods for secondary structure prediction , 2011, Briefings Bioinform..

[42]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[43]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[44]  F E Cohen,et al.  Protein folding: evaluation of some simple rules for the assembly of helices into tertiary structures with myoglobin as an example. , 1979, Journal of molecular biology.

[45]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[46]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[47]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .