Identification of Sequence-Specific Tertiary Packing Motifs in Protein Structures using Delaunay Tessellation

An approach to recognizing recurrent sequence-structure patterns in proteins has been developed, based on Delaunay tessellation of protein structure. Starting with a united residue (side chain centroids) representation of a protein structure, tessellation partitions the structure into a unique set of irregular tetra-hedra, or simplices whose vertices correspond to four nearest-neighbor residues. Tetrahedral clusters composed of residues not adjacent along the polypeptide chain have been classified according to their amino acid composition and the three distances separating the residues along the sequence; these distances being defined as the sequence lengths from first to second, second to third, and third to fourth residue. An elementary tertiary packing motif is defined as a Delaunay simplex with a specific amino acid composition, together with three sequence distances (i.e., number of residues along the sequence) between vertex residues. Analysis of three databases of diverse protein structures (< 30% sequence identity between any pair, 1922 structures total) identified 224 motifs found in at least two proteins from different fold families each. To further substantiate the methodology, three groups of proteins representing unique structural and functional families were analyzed and packing motifs characteristic of each of them have been identified. The proposed methodology is termed Simplicial Neighborhood Analysis of Protein Packing (SNAPP). SNAPP can be used to locate recurrent tertiary structural motifs as well as sequence-specific, functionally relevant patterns similar to Prosite (Hofmann, et al. 1999) signatures. We anticipate that the SNAPP methodology will be useful in automating the analysis and comparison of protein structures determined in structural and functional genomics projects.

[1]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[2]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[3]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[4]  P. Wolynes,et al.  Self‐consistently optimized statistical mechanical energy functions for sequence structure alignment , 1996, Protein science : a publication of the Protein Society.

[5]  J. Thornton,et al.  Determinants of strand register in antiparallel β‐sheets of proteins , 1998, Protein science : a publication of the Protein Society.

[6]  C. Chothia Structural invariants in protein folding , 1975, Nature.

[7]  G J Pielak,et al.  Patterned library analysis: a method for the quantitative assessment of hypotheses concerning the determinants of protein structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  A Tropsha,et al.  A new approach to protein fold recognition based on Delaunay tessellation of protein structure. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[9]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[10]  T. Schlick,et al.  Generating folded protein structures with a lattice chain growth algorithm , 2000 .

[11]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[12]  D. F. Watson Computing the n-Dimensional Delaunay Tesselation with Application to Voronoi Polytopes , 1981, Comput. J..

[13]  K. Gernert,et al.  Puzzle pieces defined: locating common packing units in tertiary protein contacts. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  C. Sander,et al.  Verification of protein structures : Side-chain planarity , 1996 .

[15]  I. Jonassen,et al.  Discovery of local packing motifs in protein structures , 1999, Proteins.

[16]  T Schlick,et al.  Lattice protein folding with two and four‐body statistical potentials , 2001, Proteins.

[17]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[18]  David C. Jones,et al.  Potential energy functions for threading. , 1996, Current opinion in structural biology.

[19]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[20]  I D Kuntz,et al.  A rapid method for exploring the protein structure universe , 1999, Proteins.

[21]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[22]  M J Sippl,et al.  Structure-derived hydrophobic potential. Hydrophobic potential derived from X-ray structures of globular proteins is able to identify native folds. , 1992, Journal of molecular biology.

[23]  C. Chothia,et al.  Structure of proteins: packing of alpha-helices and pleated sheets. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[24]  M. Sippl,et al.  Detection of native‐like models for amino acid sequences of unknown three‐dimensional structure in a data base of known protein conformations , 1992, Proteins.

[25]  Cyrus Chothia,et al.  Packing of α-Helices onto β-Pleated sheets and the anatomy of αβ proteins☆ , 1980 .

[26]  C. Chothia,et al.  Helix to helix packing in proteins. , 1981, Journal of molecular biology.

[27]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[28]  A. Tropsha,et al.  Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. , 2001, Journal of molecular biology.

[29]  F. Richards The interpretation of protein structures: total volume, group volume distributions and packing density. , 1974, Journal of molecular biology.

[30]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[31]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[32]  Shmuel Pietrokovski,et al.  Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations , 1999, Bioinform..

[33]  J. L. Finney,et al.  Random packings and the structure of simple liquids. I. The geometry of random close packing , 1970, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[34]  M Gerstein,et al.  Volume changes on protein folding. , 1994, Structure.

[35]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[36]  P. Munson,et al.  Statistical significance of hierarchical multi‐body potentials based on Delaunay tessellation and their application in sequence‐structure alignment , 1997, Protein science : a publication of the Protein Society.

[37]  A. Godzik,et al.  Sequence-structure matching in globular proteins: application to supersecondary and tertiary structure determination. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Iosif I. Vaisman,et al.  Delaunay Tessellation of Proteins: Four Body Nearest-Neighbor Propensities of Amino Acid Residues , 1996, J. Comput. Biol..

[39]  H. Wako,et al.  Novel method to detect a motif of local structures in different protein conformations. , 1998, Protein engineering.

[40]  A Tropsha,et al.  Statistical geometry analysis of proteins: implications for inverted structure prediction. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[41]  M. Levitt,et al.  The volume of atoms on the protein surface: calculated from simulation, using Voronoi polyhedra. , 1995, Journal of molecular biology.