Fault Tolerance for Large Scale Protein 3D Reconstruction from Contact Maps

In this paper we describe FT-COMAR an algorithm that improves fault tolerance of our heuristic algorithm (COMAR) previously described for protein reconstruction [10]. The algorithm [COMAR-Contact Map Reconstruction] can reconstruct the three-dimensional (3D) structure of the real protein from its contact map with 100% efficiency when tested on 1760 proteins from different structural classes. Here we test the performances of COMAR on native contact maps when a perturbation with randomerrors is introduced. This is done in order to simulate possible scenarios of reconstruction from predicted (and therefore highly noised) contact maps. From our analysis we obtain that our algorithm performs better reconstructions on blurred contact maps when contacts are under predicted than over predicted. Moreover wemodify the algorithm into FTCOMAR [Fault Tolerant-COMAR] in order to use it with incomplete contact maps. FT-COMAR can ignore up to 75% of the contact map and still recover from the remaining 25% entries a 3D structure whose root mean square deviation (RMSD) from the native one is less then 4 A. Our results indicate that the quality more than the quantity of predicted contacts is relevant to the protein 3D reconstruction and that some hints about "unsafe" areas in the predicted contact maps can be useful to improve reconstruction quality. For this, we implement a very simple filtering procedure to detect unsafe areas in contact maps and we show that by this and in the presences of errors the performance of the algorithm can be significantly improved. Furthermore, we show that both COMAR and FT-COMAR overcome a previous state-of-the-art algorithm for the same task [13].

[1]  Piero Fariselli,et al.  Reconstruction of 3D Structures From Protein Contact Maps , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[2]  Eytan Domany,et al.  Protein folding using contact maps. , 2000 .

[3]  Timothy F. Havel Distance Geometry: Theory, Algorithms, and Chemical Applications , 2002 .

[4]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[5]  P Fariselli,et al.  Progress in predicting inter‐residue contacts of proteins with neural networks and correlated mutations , 2001, Proteins.

[6]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[7]  Piero Fariselli,et al.  Reconstruction of the protein structures from contact maps , 2006 .

[8]  David G. Kirkpatrick,et al.  Unit disk graph recognition is NP-hard , 1998, Comput. Geom..

[9]  Piero Fariselli,et al.  The pros and cons of predicting protein contact maps. , 2008, Methods in molecular biology.

[10]  Pierre Baldi,et al.  Modular DAG-RNN Architectures for Assembling Coarse Protein Structures , 2006, J. Comput. Biol..

[11]  S Brunak,et al.  Protein structures from distance inequalities. , 1993, Journal of molecular biology.

[12]  Arthur M. Lesk,et al.  Introduction to bioinformatics , 2002 .

[13]  Garland R. Marshall,et al.  Properties of intraglobular contacts in proteins: an approach to prediction of tertiary structure , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.