Rapid Protein Side-Chain Packing via Tree Decomposition

This paper proposes a novel tree decomposition based side-chain assignment algorithm, which can obtain the globally optimal solution of the side-chain packing problem very efficiently. Theoretically, the computational complexity of this algorithm is O((N + M)n$_{rot}^{tw + 1}$) where N is the number of residues in the protein, M the number of interacting residue pairs, nrot the average number of rotamers for each residue and $O((N + M)n^{tw+1}_{rot})$ the tree width of the residue interaction graph. Based on this algorithm, we have developed a side-chain prediction program SCATD (Side Chain Assignment via Tree Decomposition). Experimental results show that after the Goldstein DEE is conducted, nrot is around 3.5, tw is only 3 or 4 for most of the test proteins in the SCWRL benchmark and less than 10 for all the test proteins. SCATD runs up to 90 times faster than SCWRL 3.0 on some large proteins in the SCWRL benchmark and achieves an average of five times faster speed on all the test proteins. If only the post-DEE stage is taken into consideration, then our tree-decomposition based energy minimization algorithm is more than 200 times faster than that in SCWRL 3.0 on some large proteins. SCATD is freely available for academic research upon request.

[1]  Roland L. Dunbrack,et al.  Comparative modeling of CASP3 targets using PSI‐BLAST and SCWRL , 1999, Proteins.

[2]  Hans-Peter Lenhof,et al.  BALL-rapid software prototyping in computational molecular biology , 2000, Bioinform..

[3]  Tomas Lozano-Perez,et al.  Protein side-chain placement: probabilistic inference and integer programming methods , 2004 .

[4]  Arie M. C. A. Koster,et al.  Solving frequency assignment problems via tree-decomposition , 1999 .

[5]  R. Samudrala,et al.  Determinants of side chain conformational preferences in protein structures. , 1998, Protein engineering.

[6]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[7]  N. Grishin,et al.  Side‐chain modeling with an optimized scoring function , 2002, Protein science : a publication of the Protein Society.

[8]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[9]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[10]  R Nussinov,et al.  Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  Satoru Miyano,et al.  On the approximation of protein threading , 1997, RECOMB '97.

[12]  M Karplus,et al.  Construction of side-chains in homology modelling. Application to the C-terminal lobe of rhizopuspepsin. , 1989, Journal of molecular biology.

[13]  Paul D. Seymour,et al.  Graph Minors: XV. Giant Steps , 1996, J. Comb. Theory, Ser. B.

[14]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[15]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[16]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[17]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[18]  Ying Xu,et al.  Protein Threading by Linear Programming , 2003, Pacific Symposium on Biocomputing.

[19]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[20]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[21]  Pinar Heggernes,et al.  The Minimum Degree Heuristic and the Minimal Triangulation Process , 2003, WG.

[22]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[23]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[24]  Adam Godzik,et al.  Saturated BLAST: an automated multiple intermediate sequence search used to detect distant homology , 2000, Bioinform..

[25]  Ernst Althaus,et al.  A branch and cut algorithm for the optimal solution of the side-chain placement problem , 2000 .

[26]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[27]  Eyal Amir,et al.  Efficient Approximation for Triangulation of Minimum Treewidth , 2001, UAI.

[28]  Mona Singh,et al.  A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies , 2004, INFORMS J. Comput..

[29]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[30]  Arne Elofsson,et al.  Side Chain-Positioning as an Integer Programming Problem , 2001, WABI.

[31]  Richard H. Lathrop,et al.  A branch-and-bound algorithm for optimal protein threading with pairwise (contact potential) amino acid interactions , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[32]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[33]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[34]  Thomas Lengauer,et al.  Arby: automatic protein structure prediction using profile-profile alignment and confidence measures , 2004, Bioinform..

[35]  Gary L. Miller,et al.  Separators for sphere-packings and nearest neighbor graphs , 1997, JACM.

[36]  C. Sander,et al.  Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace application to model building and detection of co-ordinate errors. , 1991, Journal of molecular biology.

[37]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[38]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[39]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[40]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[41]  Dong Xu,et al.  PROSPECT II: protein structure prediction program for genome-scale applications. , 2003, Protein engineering.

[42]  Z. Xiang,et al.  Extending the accuracy limits of prediction for side-chain conformations. , 2001, Journal of molecular biology.

[43]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[44]  Tatsuya Akutsu,et al.  Protein Side-chain Packing Problem: A Maximum Edge-weight Clique Algorithmic Approach , 2005, APBC.

[45]  Burkhard Rost,et al.  TOPITS: Threading One-Dimensional Predictions Into Three-Dimensional Structures , 1995, ISMB.

[46]  B. A. Reed,et al.  Algorithmic Aspects of Tree Width , 2003 .

[47]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[48]  Tatsuya Akutsu NP-Hardness Results for Protein Side-chain Packing , 1997 .