Toward optimal fragment generations for ab initio protein structure assembly

Fragment assembly using structural motifs excised from other solved proteins has shown to be an efficient method for ab initio protein‐structure prediction. However, how to construct accurate fragments, how to derive optimal restraints from fragments, and what the best fragment length is are the basic issues yet to be systematically examined. In this work, we developed a gapless‐threading method to generate position‐specific structure fragments. Distance profiles and torsion angle pairs are then derived from the fragments by statistical consistency analysis, which achieved comparable accuracy with the machine‐learning‐based methods although the fragments were taken from unrelated proteins. When measured by both accuracies of the derived distance profiles and torsion angle pairs, we come to a consistent conclusion that the optimal fragment length for structural assembly is around 10, and at least 100 fragments at each location are needed to achieve optimal structure assembly. The distant profiles and torsion angle pairs as derived by the fragments have been successfully used in QUARK for ab initio protein structure assembly and are provided by the QUARK online server at http://zhanglab.ccmb. med.umich.edu/QUARK/. Proteins 2013. © 2012 Wiley Periodicals, Inc.

[1]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[2]  Yang Zhang,et al.  Automated protein structure modeling in CASP9 by I‐TASSER pipeline combined with QUARK‐based ab initio folding and FG‐MD‐based structure refinement , 2011, Proteins.

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  Michael Habeck,et al.  HHfrag: HMM-based fragment detection using HHpred , 2011, Bioinform..

[5]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[6]  Osvaldo Graña,et al.  Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8 , 2009, Proteins.

[7]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[8]  Liam J McGuffin,et al.  Assembling novel protein folds from super‐secondary structural fragments , 2003, Proteins.

[9]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[10]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[11]  Sitao Wu,et al.  LOMETS: A local meta-threading-server for protein structure prediction , 2007, Nucleic acids research.

[12]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[13]  William R Taylor,et al.  Prediction of protein structure from ideal forms , 2008, Proteins.

[14]  Sitao Wu,et al.  ANGLOR: A Composite Machine-Learning Algorithm for Protein Backbone Torsion Angle Prediction , 2008, PloS one.

[15]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[17]  Tom Lenaerts,et al.  Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments , 2008, PLoS Comput. Biol..

[18]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[19]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[20]  A. Tramontano,et al.  Evaluation of residue–residue contact predictions in CASP9 , 2011, Proteins.

[21]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[22]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[24]  Yang Zhang,et al.  Generating Triangulated Macromolecular Surfaces by Euclidean Distance Transform , 2009, PloS one.

[25]  Robert P. Sheridan,et al.  3D Protein Structure Predicted from Sequence , 2011, ArXiv.

[26]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[27]  D. Eisenberg,et al.  An evolutionary approach to folding small alpha-helical proteins that uses sequence information and an empirical guiding fitness function. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[28]  A. Szilágyi,et al.  Improving protein structure prediction using multiple sequence-based contact predictions. , 2011, Structure.

[29]  David Baker,et al.  The dual role of fragments in fragment‐assembly methods for de novo protein structure prediction , 2012, Proteins.

[30]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[32]  S. Rackovsky,et al.  Calculation of protein backbone geometry from α‐carbon coordinates based on peptide‐group dipole alignment , 1993, Protein science : a publication of the Protein Society.

[33]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[34]  Yang Zhang,et al.  Recognizing protein substructure similarity using segmental threading. , 2010, Structure.

[35]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[36]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[37]  J. Skolnick,et al.  Ab initio protein structure prediction using chunk-TASSER. , 2007, Biophysical journal.

[38]  Anna R Panchenko,et al.  Finding weak similarities between proteins by sequence profile comparison. , 2003, Nucleic acids research.

[39]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[40]  David E. Kim,et al.  Physically realistic homology models built with ROSETTA can be more accurate than their templates. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[42]  C. Floudas,et al.  ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. , 2003, Biophysical journal.

[43]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[44]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[45]  Kevin Karplus,et al.  Contact prediction using mutual information and neural nets , 2007, Proteins.

[46]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[47]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[48]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[49]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.