Fast and accurate algorithms for protein side-chain packing

This article studies the protein side-chain packing problem using the tree-decomposition of a protein structure. To obtain fast and accurate protein side-chain packing, protein structures are modeled using a geometric neighborhood graph, which can be easily decomposed into smaller blocks. Therefore, the side-chain assignment of the whole protein can be assembled from the assignment of the small blocks. Although we will show that the side-chain packing problem is still <i>NP</i>-hard, we can achieve a tree-decomposition-based globally optimal algorithm with time complexity of <i>O</i>(<i>Nn</i><sub><i>rot</i></sub><sup><i>tw</i> + 1)</sup> and several polynomial-time approximation schemes (PTAS), where <i>N</i> is the number of residues contained in the protein, <i>n</i><sub><i>rot</i></sub> the average number of rotamers for each residue, and <i>tw</i> = <i>O</i>(<i>N</i><sup>2/3</sup> log <i>N</i>) the treewidth of the protein structure graph. Experimental results indicate that after Goldstein dead-end elimination is conducted, <i>n</i><sub><i>rot</i></sub> is very small and <i>tw</i> is equal to 3 or 4 most of the time. Based on the globally optimal algorithm, we developed a protein side-chain assignment program TreePack, which runs up to 90 times faster than SCWRL 3.0, a widely-used side-chain packing program, on some large test proteins in the SCWRL benchmark database and an average of five times faster on all the test proteins in this database. There are also some real-world instances that TreePack can solve but that SCWRL 3.0 cannot. The TreePack program is available at http://ttic.uchicago.edu/~jinbo/TreePack.htm.

[1]  C. Sander,et al.  Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace application to model building and detection of co-ordinate errors. , 1991, Journal of molecular biology.

[2]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP): Round III , 1999, Proteins.

[3]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[4]  M Karplus,et al.  Construction of side-chains in homology modelling. Application to the C-terminal lobe of rhizopuspepsin. , 1989, Journal of molecular biology.

[5]  TengShang-Hua,et al.  Separators for sphere-packings and nearest neighbor graphs , 1997 .

[6]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[7]  Adam Godzik,et al.  Saturated BLAST: an automated multiple intermediate sequence search used to detect distant homology , 2000, Bioinform..

[8]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[9]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[10]  Z. Xiang,et al.  Extending the accuracy limits of prediction for side-chain conformations. , 2001, Journal of molecular biology.

[11]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2003, Proteins.

[12]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[13]  Tatsuya Akutsu NP-Hardness Results for Protein Side-chain Packing , 1997 .

[14]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[15]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[16]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[17]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[18]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[19]  Roland L. Dunbrack,et al.  Comparative modeling of CASP3 targets using PSI‐BLAST and SCWRL , 1999, Proteins.

[20]  Hans-Peter Lenhof,et al.  BALL-rapid software prototyping in computational molecular biology , 2000, Bioinform..

[21]  N. Grishin,et al.  Side‐chain modeling with an optimized scoring function , 2002, Protein science : a publication of the Protein Society.

[22]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[23]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[24]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[25]  Ying Xu,et al.  Protein Threading by Linear Programming , 2003, Pacific Symposium on Biocomputing.

[26]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[27]  Tatsuya Akutsu,et al.  Protein Side-chain Packing Problem: A Maximum Edge-weight Clique Algorithmic Approach , 2005, APBC.

[28]  R Nussinov,et al.  Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[29]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[30]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[31]  Thomas Lengauer,et al.  Arby: automatic protein structure prediction using profile-profile alignment and confidence measures , 2004, Bioinform..

[32]  Gary L. Miller,et al.  Separators for sphere-packings and nearest neighbor graphs , 1997, JACM.

[33]  Pinar Heggernes,et al.  The Minimum Degree Heuristic and the Minimal Triangulation Process , 2003, WG.

[34]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[35]  Mona Singh,et al.  A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies , 2004, INFORMS J. Comput..

[36]  Arne Elofsson,et al.  Side Chain-Positioning as an Integer Programming Problem , 2001, WABI.

[37]  Tao Jiang,et al.  On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..

[38]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[39]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[40]  R. Samudrala,et al.  Determinants of side chain conformational preferences in protein structures. , 1998, Protein engineering.

[41]  Eyal Amir,et al.  Efficient Approximation for Triangulation of Minimum Treewidth , 2001, UAI.

[42]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[43]  S. Subbiah,et al.  Prediction of protein side-chain conformation by packing optimization. , 1991, Journal of molecular biology.

[44]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[45]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.