An Algorithm for Simultaneous Backbone Threading and Side-Chain Packing

Abstract To fully utilize all available information in protein structure prediction, including both backbone and side-chain structures, we present a novel algorithm for solving a generalized threading problem. In this problem, we consider simultaneously backbone threading and side-chain packing during the process of a protein structure prediction. For a given query protein sequence and a template structure, our goal is to find a threading alignment between the query sequence and the template structure, along with a rotamer assignment for each side-chain of the query protein, which optimizes an energy function that combines a backbone threading energy and a side-chain packing energy. This highly computationally challenging problem is solved through first formulating this problem as a graph-based optimization problem. Various graph-theoretic techniques are employed to achieve the computational efficiency to make our algorithm practically useful, which takes advantage of a number of special properties of the graph representing this generalized threading problem. The overall framework of our algorithm is a dynamic programming algorithm implemented on an optimal tree decomposition of the graph representation of our problem. By using various additional heuristic techniques such as the dead-end elimination, we have demonstrated that our algorithm can solve a generalized threading problem within practically acceptable amount of time and space, the first of its kind.

[1]  C Sander,et al.  Predicting protein structure using hidden Markov models , 1997, Proteins.

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[4]  Detlef Seese,et al.  Easy Problems for Tree-Decomposable Graphs , 1991, J. Algorithms.

[5]  M Vendruscolo,et al.  Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? , 2000, Proteins.

[6]  A. Godzik,et al.  The interplay of fold recognition and experimental structure determination in structural genomics. , 2004, Current opinion in structural biology.

[7]  B Honig,et al.  Sequence to structure alignment in comparative modeling using PrISM , 1999, Proteins.

[8]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[9]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[10]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[11]  Lei Xie,et al.  Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling , 2003, Proteins.

[12]  I Lasters,et al.  Theoretical and algorithmical optimization of the dead-end elimination theorem. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[14]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005 .

[15]  B. A. Reed,et al.  Algorithmic Aspects of Tree Width , 2003 .

[16]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[17]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules J. Am. Chem. Soc. 1995, 117, 5179−5197 , 1996 .

[18]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[19]  J. Skolnick,et al.  Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement , 2001, Proteins.

[20]  Adrian A Canutescu,et al.  A graph‐theory algorithm for rapid protein side‐chain prediction , 2003, Protein science : a publication of the Protein Society.

[21]  Michael I. Jordan Graphical Models , 1998 .

[22]  J. Straub,et al.  Orientational potentials extracted from protein structures improve native fold recognition , 2004, Protein science : a publication of the Protein Society.

[23]  Janice I. Glasgow,et al.  Crystallographic Threading , 1999, ISMB.

[24]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[25]  Pierre-Yves Calland On the structural complexity of a protein. , 2003, Protein engineering.

[26]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[27]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[28]  Eugene I Shakhnovich,et al.  Structural mining: self-consistent design on flexible protein-peptide docking and transferable binding affinity potential. , 2004, Journal of the American Chemical Society.

[29]  Ceslovas Venclovas,et al.  Progress over the first decade of CASP experiments , 2005, Proteins.

[30]  D Thirumalai,et al.  Development of novel statistical potentials for protein fold recognition. , 2004, Current opinion in structural biology.

[31]  A. Torda,et al.  Enhanced protein fold recognition using secondary structure information from nmr , 1999, Protein science : a publication of the Protein Society.

[32]  Eugene I Shakhnovich,et al.  Lessons from the design of a novel atomic potential for protein folding , 2005, Protein science : a publication of the Protein Society.

[33]  R Thiele,et al.  Protein threading by recursive dynamic programming. , 1999, Journal of molecular biology.

[34]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[35]  Steven E Brenner,et al.  Measurements of protein sequence–structure correlations , 2004, Proteins.

[36]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[37]  D. T. Jones,et al.  Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure , 1999, Proteins.

[38]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[39]  Ying Xu,et al.  Protein Threading by Linear Programming , 2003, Pacific Symposium on Biocomputing.

[40]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[41]  J Skolnick,et al.  Recognition of protein structure on coarse lattices with residue-residue energy functions. , 1997, Protein engineering.

[42]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[43]  Jun-tao Guo,et al.  Quantitative evaluation of protein–DNA interactions using an optimized knowledge-based potential , 2005, Nucleic acids research.

[44]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.

[45]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[46]  Jinbo Xu,et al.  Rapid Protein Side-Chain Packing via Tree Decomposition , 2005, RECOMB.

[47]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[49]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[50]  Ying Xu,et al.  Protein Threading by Linear Programming: Theoretical Analysis and Computational Results , 2004, J. Comb. Optim..

[51]  J. Skolnick,et al.  Averaging interaction energies over homologs improves protein fold recognition in gapless threading , 1999, Proteins.

[52]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[53]  E. Shakhnovich,et al.  The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Martin Loebl,et al.  Jordan Graphs , 1996, CVGIP Graph. Model. Image Process..

[55]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.