Defining an Essence of Structure Determining Residue Contacts in Proteins

The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking.

[1]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[2]  Gordon M. Crippen,et al.  Distance Geometry and Molecular Conformation , 1988 .

[3]  Alpan Raval,et al.  Evolution favors protein mutational robustness in sufficiently large populations , 2007 .

[4]  Piero Fariselli,et al.  Reconstruction of 3D Structures From Protein Contact Maps , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[5]  Christoph H Borchers,et al.  BiPS, a Photocleavable, Isotopically Coded, Fluorescent Cross-linker for Structural Proteomics * , 2009, Molecular & Cellular Proteomics.

[6]  Rong Chen,et al.  Generating properly weighted ensemble of conformations of proteins from sparse or indirect distance constraints. , 2008, The Journal of chemical physics.

[7]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[8]  Michele Vendruscolo,et al.  Reconstruction of protein structures from a vectorial representation. , 2004, Physical review letters.

[9]  D. Baker,et al.  A surprising simplicity to protein folding , 2000, Nature.

[10]  D. Hilvert,et al.  Relative tolerance of an enzymatic molten globule and its thermostable counterpart to point mutation. , 2008, Biochemistry.

[11]  M. DePristo,et al.  Discrete restraint-based protein modeling and the Calpha-trace problem. , 2003, Protein science : a publication of the Protein Society.

[12]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[13]  Nicholas Furnham,et al.  Comparative modelling by restraint-based conformational sampling , 2008, BMC Structural Biology.

[14]  Jennifer L. Martin,et al.  Modelling the structure of latexin-carboxypeptidase A complex based on chemical cross-linking and molecular docking. , 2006, Protein engineering, design & selection : PEDS.

[15]  R. Levy,et al.  Global folding of proteins using a limited number of distance constraints. , 1993, Protein engineering.

[16]  N. Lümmen,et al.  Common neighbour analysis for binary atomic systems , 2007 .

[17]  Feng Ding,et al.  Fidelity of the protein structure reconstruction from inter-residue proximity constraints. , 2007, The journal of physical chemistry. B.

[18]  M Michael Gromiha,et al.  Inter-residue interactions in protein folding and stability. , 2004, Progress in biophysics and molecular biology.

[19]  M. DePristo,et al.  Discrete restraint‐based protein modeling and the Cα‐trace problem , 2003 .

[20]  Paulo S. Branicio,et al.  Structural characterization of deformed crystals by analysis of common atomic neighborhood , 2007, Comput. Phys. Commun..

[21]  Donald Hilvert,et al.  Relative tolerance of mesostable and thermostable protein homologs to extensive mutation , 2006, Proteins.

[22]  Jens Meiler,et al.  De novo high-resolution protein structure determination from sparse spin-labeling EPR data. , 2008, Structure.

[23]  Michele Vendruscolo,et al.  A stochastic method for the reconstruction of protein structures from one-dimensional structural profiles. , 2008, Gene.

[24]  Richard Bonneau,et al.  Contact order and ab initio protein structure prediction , 2002, Protein science : a publication of the Protein Society.

[25]  Piero Fariselli,et al.  FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps , 2008, Bioinform..

[26]  Ganesh Bagler,et al.  Assortative mixing in Protein Contact Networks and protein folding kinetics , 2007, Bioinform..

[27]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[28]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[29]  Malin M. Young,et al.  High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry , 2000, Proc. Natl. Acad. Sci. USA.