Geometric Sieving: Automated Distributed Optimization of 3D Motifs for Protein Function Prediction

Determining the function of all proteins is a recurring theme in modern biology and medicine, but the sheer number of proteins makes experimental approaches impractical. For this reason, current efforts have considered in silico function prediction in order to guide and accelerate the function determination process. One approach to predicting protein function is to search functionally uncharacterized protein structures (targets), for substructures with geometric and chemical similarity (matches), to known active sites (motifs). Finding a match can imply that the target has an active site similar to the motif, suggesting functional homology. An effective function predictor requires effective motifs – motifs whose geometric and chemical characteristics are detected by comparison algorithms within functionally homologous targets (sensitive motifs), which also are not detected within functionally unrelated targets (specific motifs). Designing effective motifs is a difficult open problem. Current approaches select and combine structural, physical, and evolutionary properties to design motifs that mirror functional characteristics of active sites. We present a new approach, Geometric Sieving (GS), which refines candidate motifs into optimized motifs with maximal geometric and chemical dissimilarity from all known protein structures. The paper discusses both the usefulness and the efficiency of GS. We show that candidate motifs from six well-studied proteins, including α-Chymotrypsin, Dihydrofolate Reductase, and Lysozyme, can be optimized with GS to motifs that are among the most sensitive and specific motifs possible for the candidate motifs. For the same proteins, we also report results that relate evolutionarily important motifs with motifs that exhibit maximal geometric and chemical dissimilarity from all known protein structures. Our current observations show that GS is a powerful tool that can complement existing work on motif design and protein function prediction.

[1]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[2]  S J Oatley,et al.  Crystal structures of Escherichia coli dihydrofolate reductase: the NADP+ holoenzyme and the folate.NADP+ ternary complex. Substrate binding and a model for the transition state. , 1990, Biochemistry.

[3]  D. Blow,et al.  Role of a Buried Acid Group in the Mechanism of Action of Chymotrypsin , 1969, Nature.

[4]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[5]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[6]  R. Nussinov,et al.  Molecular shape comparisons in searches for active sites and functional similarity. , 1998, Protein engineering.

[7]  D Fischer,et al.  A computer vision based technique for 3-D sequence-independent structural comparison of proteins. , 1993, Protein engineering.

[8]  Bradley Efron,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Rejoinder. , 2008, 0808.0572.

[9]  K Kirschner,et al.  The crystal structure of indole-3-glycerol phosphate synthase from the hyperthermophilic archaeon Sulfolobus solfataricus in three different crystal forms: effects of ionic strength. , 1996, Journal of molecular biology.

[10]  F E Cohen,et al.  Identification of functional surfaces of the zinc binding domains of intracellular receptors. , 1997, Journal of molecular biology.

[11]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[12]  Ruth Nussinov,et al.  Recognition of Binding Patterns Common to a Set of Protein Structures , 2005, RECOMB.

[13]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[14]  K. Kinoshita,et al.  Identification of protein biochemical functions by similarity search using the molecular surface database eF‐site , 2003, Protein science : a publication of the Protein Society.

[15]  Ruth Nussinov,et al.  FlexProt: Alignment of Flexible Protein Structures Without a Predefinition of Hinge Regions , 2004, J. Comput. Biol..

[16]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[17]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[18]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[19]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[20]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[21]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[22]  Robert B Russell,et al.  A model for statistical significance of local similarities in structure. , 2003, Journal of molecular biology.

[23]  James C Sacchettini,et al.  Crystal Structures of Mycolic Acid Cyclopropane Synthases fromMycobacterium tuberculosis * , 2002, The Journal of Biological Chemistry.

[24]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[25]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[26]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[27]  J. Kraut,et al.  Isomorphous crystal structures of Escherichia coli dihydrofolate reductase complexed with folate, 5-deazafolate, and 5,10-dideazatetrahydrofolate: mechanistic implications. , 1995, Biochemistry.

[28]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[29]  B. Dijkstra,et al.  Three-dimensional structure of Endo-1,4-beta-xylanase I from Aspergillus niger: molecular basis for its low pH optimum. , 1996, Journal of molecular biology.

[30]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[31]  Lydia E. Kavraki,et al.  Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs , 2004, Pacific Symposium on Biocomputing.

[32]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[33]  M. L. Connolly Solvent-accessible surfaces of proteins and nucleic acids. , 1983, Science.