Clique-based algorithms for protein threading with profiles and constraints

Protein threading with profiles in which constraints on dist ances between residues are given is known to be NP-hard. Moreover, a simple algorithm known as CLIQUETHREAD based on efficient reduction to maximum edge-weight clique finding problem has been k nown to be a practical algorithm for solving the protein threading problem with profiles and cons traints. This algorithm is not efficient enough to be applicable to large scale threading prediction. Besides, the algorithm was only presented for profile threading with strict constraints. This paper pr esents a more efficient algorithm FTHREAD for profile threading with strict constraints which is more t han times faster than CLIQUETHREAD for larger proteins. Moreover, we also present a novel practical algorithm NTHREAD for profile threading with non-strict constraints. The comparison of FTHREAD with existing state-of-the-art methods shows that although our algorithm uses a simple threading function, our algorithm performs equally well as these existing methods for protein threading. Besides, our computational experiments for sequence-structure alignments for a number of proteins have shown better results for non-strict constraints threading than protein threading with strict c onstraints. We have also analyzed the effects of using a number of distance constraints.

[1]  E. Tomita,et al.  An Algorithm for Finding a Maximum Clique with Maximum Edge - Weight and Computational Experiments , 2002 .

[2]  Zhi-Zhong Chen,et al.  The longest common subsequence problem for sequences with nested arc annotations , 2002, J. Comput. Syst. Sci..

[3]  P Rotkiewicz,et al.  A method for the improvement of threading‐based protein models , 1999, Proteins.

[4]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[5]  Ying Xu,et al.  A Computational Method for NMR-Constrained Protein Threading , 2000, J. Comput. Biol..

[6]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[7]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[8]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[9]  Temple F. Smith,et al.  Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[10]  P Rotkiewicz,et al.  Generalized comparative modeling (GENECOMP): A combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement , 2001, Proteins.

[11]  Malin M. Young,et al.  High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry , 2000, Proc. Natl. Acad. Sci. USA.

[12]  Robert D. Carr,et al.  101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem , 2001, RECOMB.

[13]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[14]  Tatsuya Akutsu,et al.  Protein threading with profiles and constraints , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[15]  Nick V Grishin,et al.  Access the most recent version at doi: 10.1110/ps.03197403 References , 2003 .

[16]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[17]  Satoru Miyano,et al.  On the Approximation of Protein Threading , 1999, Theor. Comput. Sci..

[18]  Satoru Miyano,et al.  On the approximation of protein threading , 1997, RECOMB '97.

[19]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[20]  L. Mirny,et al.  Protein structure prediction by threading. Why it works and why it does not. , 1998, Journal of molecular biology.

[21]  Y Shan,et al.  Fold recognition and accurate query‐template alignment by a combination of PSI‐BLAST and threading , 2001, Proteins.

[22]  Etsuji Tomita,et al.  An Efficient Branch-and-Bound Algorithm for Finding a Maximum Clique , 2003, DMTCS.

[23]  Tatsuya Akutsu,et al.  Protein Structure Alignment Using Dynamic Programing and Iterative Improvement , 1996 .

[24]  Patricia A. Evans Finding Common Subsequences with Arcs and Pseudoknots , 1999, CPM.

[25]  Daniel Hanisch,et al.  Improving fold recognition of protein threading by experimental distance constraints , 2002, Silico Biol..