An Algorithm for Simultaneous Backbone Threading and Side-Chain Packing

To utilize fully all available information in protein structure prediction, including both backbone and side-chain structures, we present a novel algorithm for solving a generalized threading problem. In this problem we consider simultaneous backbone threading and side-chain packing during the process of a protein structure prediction. For a given query protein sequence and a template structure, our goal is to find a threading alignment between the query sequence and the template structure, along with a rotamer assignment for each side-chain of the query protein, which optimizes an energy function that combines a backbone threading energy and a side-chain packing energy. This highly computationally challenging problem is solved through first formulating this problem as a graph-based optimization problem. Various graph-theoretic techniques are employed to achieve the computational efficiency to make our algorithm practically useful, which takes advantage of a number of special properties of the graph representing this generalized threading problem. The overall framework of our algorithm is a dynamic programming algorithm implemented on an optimal tree decomposition of the graph representation of our problem. By using various additional heuristic techniques such as dead-end elimination, we have demonstrated that our algorithm can solve a generalized threading problem within a practically acceptable amount of time and space, the first of its kind.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  Ying Xu,et al.  Protein Threading by Linear Programming: Theoretical Analysis and Computational Results , 2004, J. Comb. Optim..

[3]  J. Skolnick,et al.  Averaging interaction energies over homologs improves protein fold recognition in gapless threading , 1999, Proteins.

[4]  A. Godzik,et al.  The interplay of fold recognition and experimental structure determination in structural genomics. , 2004, Current opinion in structural biology.

[5]  B Honig,et al.  Sequence to structure alignment in comparative modeling using PrISM , 1999, Proteins.

[6]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[7]  Eugene I Shakhnovich,et al.  Structural mining: self-consistent design on flexible protein-peptide docking and transferable binding affinity potential. , 2004, Journal of the American Chemical Society.

[8]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[9]  D Thirumalai,et al.  Development of novel statistical potentials for protein fold recognition. , 2004, Current opinion in structural biology.

[10]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[11]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[12]  Jinbo Xu,et al.  Rapid Protein Side-Chain Packing via Tree Decomposition , 2005, RECOMB.

[13]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[14]  I Lasters,et al.  Theoretical and algorithmical optimization of the dead-end elimination theorem. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[15]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[17]  Lei Xie,et al.  Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling , 2003, Proteins.

[18]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[19]  E. Shakhnovich,et al.  The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[21]  Pierre-Yves Calland On the structural complexity of a protein. , 2003, Protein engineering.

[22]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[23]  Ceslovas Venclovas,et al.  Progress over the first decade of CASP experiments , 2005, Proteins.

[24]  J Skolnick,et al.  Recognition of protein structure on coarse lattices with residue-residue energy functions. , 1997, Protein engineering.

[25]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[26]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[27]  Detlef Seese,et al.  Easy Problems for Tree-Decomposable Graphs , 1991, J. Algorithms.

[28]  M Vendruscolo,et al.  Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? , 2000, Proteins.

[29]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[30]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[31]  Steven E Brenner,et al.  Measurements of protein sequence–structure correlations , 2004, Proteins.

[32]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[33]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[34]  Jun-tao Guo,et al.  Quantitative evaluation of protein–DNA interactions using an optimized knowledge-based potential , 2005, Nucleic acids research.

[35]  C Sander,et al.  Predicting protein structure using hidden Markov models , 1997, Proteins.

[36]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[37]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[38]  A. Torda,et al.  Enhanced protein fold recognition using secondary structure information from nmr , 1999, Protein science : a publication of the Protein Society.

[39]  J. Skolnick,et al.  Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement , 2001, Proteins.

[40]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[41]  Eugene I Shakhnovich,et al.  Lessons from the design of a novel atomic potential for protein folding , 2005, Protein science : a publication of the Protein Society.

[42]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[43]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.

[44]  J. Straub,et al.  Orientational potentials extracted from protein structures improve native fold recognition , 2004, Protein science : a publication of the Protein Society.

[45]  R Thiele,et al.  Protein threading by recursive dynamic programming. , 1999, Journal of molecular biology.

[46]  D. T. Jones,et al.  Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure , 1999, Proteins.

[47]  Ying Xu,et al.  Protein Threading by Linear Programming , 2003, Pacific Symposium on Biocomputing.

[48]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[49]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.