Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction

BackgroundStructural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Comparative analysis of drug-receptor substructures across and within species has been used for lead evaluation. Substructure-level similarity between the binding sites of functionally similar proteins has also been used to identify instances of convergent evolution among proteins. In functionally homologous protein families, shared chemistry and geometry at catalytic sites provide a common, local point of comparison among proteins that may differ significantly at the sequence, fold, or domain topology levels.ResultsThis paper describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine Substructural Clusters (SCs). SCs characterize the binding site substructural variation within a protein family. In this paper we focus on examples of automatically determined SCs that can be linked to phylogenetic distance between family members, segregation by conformation, and organization by homology among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative motif for each protein cluster among the SCs determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing motifs.ConclusionsFASST contributes a critical feedback and assessment step to existing binding site substructure identification methods and can be used for the thorough investigation of structure-function relationships. The application of MESH allows for an automated, statistically rigorous procedure for incorporating structural variation data into protein function prediction pipelines. Our work provides an unbiased, automated assessment of the structural variability of identified binding site substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function. As available proteomic data continues to expand, the techniques proposed will be indispensable for the large-scale analysis and interpretation of structural data.

[1]  Haruki Nakamura,et al.  Comprehensive structural classification of ligand-binding motifs in proteins. , 2008, Structure.

[2]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[3]  S. Sarkar,et al.  The Simes Method for Multiple Hypothesis Testing with Positively Dependent Test Statistics , 1997 .

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  G. Petsko,et al.  Multiple solvent crystal structures: probing binding sites, plasticity and hydration. , 2006, Journal of molecular biology.

[6]  Shoshana J. Wodak,et al.  LigASite—a database of biologically relevant binding sites in proteins with known apo-structures , 2007, Nucleic Acids Res..

[7]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[8]  G. Klebe,et al.  Unexpected nanomolar inhibition of carbonic anhydrase by COX-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. , 2004, Journal of medicinal chemistry.

[9]  Ruth Nussinov,et al.  The Multiple Common Point Set Problem and Its Application to Molecule Binding Pattern Detection , 2006, J. Comput. Biol..

[10]  Lei Xie,et al.  Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments , 2008, Proceedings of the National Academy of Sciences.

[11]  Ziding Zhang,et al.  Similarity networks of protein binding sites , 2005, Proteins.

[12]  C. Ó’Fágáin,et al.  The phylogeny of the mammalian heme peroxidases and the evolution of their diverse functions , 2008, BMC Evolutionary Biology.

[13]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[14]  D. Blow More of the catalytic triad , 1990, Nature.

[15]  Christophe Dunand,et al.  Prokaryotic origins of the non-animal peroxidase superfamily and organelle-mediated transmission to eukaryotes. , 2007, Genomics.

[16]  R. Jackson,et al.  Structural Classification of Phosphate Binding Sites in Protein-Nucleotide Complexes: An Automated All-Against-All Structural Comparison Using Geometric Matching , 2003 .

[17]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[18]  Dariya S. Glazer,et al.  The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications , 2008, BMC Genomics.

[19]  Daniel J Rigden,et al.  Understanding the cell in terms of structure and function: insights from structural genomics. , 2006, Current opinion in biotechnology.

[20]  G. Klebe,et al.  A new method to detect related function among proteins independent of sequence and fold homology. , 2002, Journal of molecular biology.

[21]  T. Klabunde Chemogenomic approaches to drug discovery: similar receptors bind similar ligands , 2007, British journal of pharmacology.

[22]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[23]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[24]  Stefan Svensson,et al.  Active site variability of type 1 11β-hydroxysteroid dehydrogenase revealed by selective inhibitors and cross-species comparisons , 2006, Molecular and Cellular Endocrinology.

[25]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[26]  Jack Snoeyink,et al.  Multiple structure alignment by optimal RMSD implies that the average structure is a consensus. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[27]  D. Ringe,et al.  Locating and characterizing binding sites on proteins , 1996, Nature Biotechnology.

[28]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[29]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[30]  A. Murzin,et al.  Evolution of protein fold in the presence of functional constraints. , 2006, Current opinion in structural biology.

[31]  N. Kunishima,et al.  Crystal Structures of Cyanide- and Triiodide-bound Forms of Arthromyces ramosus Peroxidase at Different pH Values , 1995, The Journal of Biological Chemistry.

[32]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[33]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[34]  Robert B. Russell,et al.  Annotation in three dimensions , 2003 .

[35]  Philip E. Bourne,et al.  Drug Discovery Using Chemical Systems Biology: Identification of the Protein-Ligand Binding Network To Explain the Side Effects of CETP Inhibitors , 2009, PLoS Comput. Biol..

[36]  Olivier Lichtarge,et al.  Composite motifs integrating multiple protein structures increase sensitivity for function prediction. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[37]  Burkhard Rost,et al.  Comprehensive in silico mutagenesis highlights functionally important residues in proteins , 2008, ECCB.

[38]  H M Holden,et al.  Slow- and fast-binding inhibitors of thermolysin display different modes of binding: crystallographic analysis of extended phosphonamidate transition-state analogues. , 1989, Biochemistry.

[39]  Ian Sillitoe,et al.  FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies , 2009, PLoS Comput. Biol..

[40]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[41]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[42]  Akira R. Kinjo,et al.  Similarity search for local protein structures at atomic resolution by exploiting a database management system , 2007, Biophysics.

[43]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[44]  R. Russell,et al.  Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. , 1998, Journal of molecular biology.

[45]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[46]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[47]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[48]  Valentin A. Ilyin,et al.  LigBase: a database of families of aligned ligand binding sites in known protein sequences and structures , 2002, Bioinform..

[49]  Heather A Carlson,et al.  Protein flexibility and species specificity in structure-based drug discovery: dihydrofolate reductase as a test system. , 2007, Journal of the American Chemical Society.

[50]  R E Hubbard,et al.  Experimental and computational mapping of the binding surface of a crystalline protein. , 2001, Protein engineering.

[51]  Neil D. Rawlings,et al.  [2] Families of serine peptidases , 1994, Methods in Enzymology.

[52]  D. R. Holland,et al.  Structural analysis of zinc substitutions in the active site of thermolysin , 1995, Protein science : a publication of the Protein Society.

[53]  Jie Liang,et al.  CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues , 2006, Nucleic Acids Res..

[54]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[55]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[56]  Peter Willett,et al.  Searching for Patterns of Amino Acids in 3D Protein Structures , 2003, J. Chem. Inf. Comput. Sci..

[57]  N. Gold,et al.  Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships. , 2006, Journal of molecular biology.

[58]  R E Hubbard,et al.  Locating interaction sites on proteins: The crystal structure of thermolysin soaked in 2% to 100% isopropanol , 1999, Proteins.

[59]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[60]  J. Pronk,et al.  Development of efficient xylose fermentation in Saccharomyces cerevisiae: xylose isomerase as a key component. , 2007, Advances in biochemical engineering/biotechnology.

[61]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[62]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[63]  Lydia E. Kavraki,et al.  The MASH Pipeline for Protein Function Prediction and an Algorithm for the Geometric Refinement of 3D Motifs , 2007, J. Comput. Biol..

[64]  Bärbel Hahn-Hägerdal,et al.  Comparison of the xylose reductase-xylitol dehydrogenase and the xylose isomerase pathways for xylose fermentation by recombinant Saccharomyces cerevisiae , 2007, Microbial cell factories.

[65]  G. Dunteman Principal Components Analysis , 1989 .

[66]  Wayne A Hendrickson,et al.  Impact of structures from the protein structure initiative. , 2007, Structure.

[67]  Tal Pupko,et al.  Structural Genomics , 2005 .

[68]  Lydia E Kavraki,et al.  Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction , 2007, Proteins.

[69]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[70]  Conrad C. Huang,et al.  Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. , 2006, Biochemistry.

[71]  Andrea Schmidt,et al.  Trypsin Revisited , 2003, Journal of Biological Chemistry.

[72]  B.Y. Chen,et al.  A statistical model to correct systematic bias introduced by algorithmic thresholds in protein structural comparison algorithms , 2008, 2008 IEEE International Conference on Bioinformatics and Biomeidcine Workshops.

[73]  P. Babbitt,et al.  Superfamily active site templates , 2004, Proteins.

[74]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[75]  Jie Liang,et al.  Predicting Protein Function and Binding Profile via Matching of Local Evolutionary and Geometric Surface Patterns , 2009 .

[76]  D. Rognan Chemogenomic approaches to rational drug design , 2007, British journal of pharmacology.

[77]  J. Dobó,et al.  Active Site Distortion Is Sufficient for Proteinase Inhibition by Serpins , 2006, Journal of Biological Chemistry.

[78]  Mark Moll,et al.  Matching of structural motifs using hashing on residue labels and geometric filtering for protein function prediction. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[79]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.