Reconstruction of the protein structures from contact maps

The physical protein contact map is a distinctive signature of its folded structure and contains information on all the subtle interactions within contacts among residues responsible of protein stability/function. It is a common believe that an efficient prediction of protein contact maps from the protein chains will help in finding solutions to the protein folding problem. It is therefore urgent to develop tools in order to reconstruct the three-dimensional structure of a protein from its contact map. In this paper we address this problem and we describe an efficient and very fast procedure. We show that our method can reconstruct with zero contact map errors all the protein structures of our data set, and this is obtained irrespectively of the threshold adopted in the contact definition (from 7 to 18 Angstrom). To the best of our knowledge none of the methods previously described for the same task scored with a similar efficiency. Moreover, this result is obtained on a non-redundant data set of 1760 proteins, and this is by far the largest dataset used for this purpose. The algorithm is very fast with an average execution time that ranges from 3 to 30 seconds, depending on the threshold adopted when computing the contact map. Finally, we show that it is possible to reconstruct protein structures that completely satisfy the native contact maps (zero errors), but can be up to 40 Angstroms of RMSD far from the native 3D structures. Our analysis shows that contact maps computed at thresholds ranging from 12 to 18 residues allow better 3D structure recovery than those computed at lower thresholds. Conctact: margara@cs.unibo.it, vassura@cs.unibo.it, casadio@alma.unibo.it 1. Computer Science Department, University of Bologna. 2. Biocomputing Group Department of Biology, University of Bologna.

[1]  J. J. Moré,et al.  Global continuation for distance geometry problems , 1995 .

[2]  P Fariselli,et al.  Progress in predicting inter‐residue contacts of proteins with neural networks and correlated mutations , 2001, Proteins.

[3]  S Brunak,et al.  Protein structures from distance inequalities. , 1993, Journal of molecular biology.

[4]  Eytan Domany,et al.  Protein folding using contact maps. , 2000 .

[5]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[6]  Pierre Baldi,et al.  Modular DAG-RNN Architectures for Assembling Coarse Protein Structures , 2006, J. Comput. Biol..

[7]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[8]  Garland R. Marshall,et al.  Properties of intraglobular contacts in proteins: an approach to prediction of tertiary structure , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[9]  Leonard M. Blumenthal,et al.  Theory and applications of distance geometry , 1954 .

[10]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[11]  Timothy F. Havel Distance Geometry: Theory, Algorithms, and Chemical Applications , 2002 .

[12]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[13]  David A. Fenstermacher,et al.  Introduction to bioinformatics , 2005, J. Assoc. Inf. Sci. Technol..

[14]  Gordon M. Crippen,et al.  Distance Geometry and Molecular Conformation , 1988 .

[15]  Piero Fariselli,et al.  The pros and cons of predicting protein contact maps. , 2008, Methods in molecular biology.