Prediction of functionally important residues in globular proteins from unusual central distances of amino acids

BackgroundWell-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues.ResultsUsing a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi.ConclusionsProbabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.

[1]  G. Vriend,et al.  Prediction of protein residue contacts with a PDB-derived likelihood matrix. , 2002, Protein engineering.

[2]  S. Rackovsky,et al.  Hydrophobicity, hydrophilicity, and the radial and orientational distributions of residues in native proteins. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[3]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[4]  Z. Weng,et al.  Protein–protein docking benchmark version 3.0 , 2008, Proteins.

[5]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[6]  Ryan G. Coleman,et al.  Protein Pockets: Inventory, Shape, and Comparison , 2010, J. Chem. Inf. Model..

[7]  A. Schmidt,et al.  Internal motion in protein crystal structures , 2010, Protein science : a publication of the Protein Society.

[8]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[9]  V. Gladyshev,et al.  Cysteine function governs its conservation and degeneration and restricts its utilization on protein surfaces. , 2010, Journal of molecular biology.

[10]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[11]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[12]  M. Ondrechen,et al.  THEMATICS: A simple computational predictor of enzyme function from structure , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  H. Guy Amino acid side-chain partition energies and distribution of residues in soluble proteins. , 1985, Biophysical journal.

[14]  J. Warwicker,et al.  Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. , 2004, Journal of molecular biology.

[15]  K. Dill Dominant forces in protein folding. , 1990, Biochemistry.

[16]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[17]  F M Richards,et al.  An analysis of packing in the protein folding problem , 1993, Quarterly Reviews of Biophysics.

[18]  Irena Roterman-Konieczna,et al.  Prediction of Functional Sites Based on the Fuzzy Oil Drop Model , 2007, PLoS Comput. Biol..

[19]  Alexandre G de Brevern,et al.  Fast and automated functional classification with MED‐SuMo: An application on purine‐binding proteins , 2010, Protein science : a publication of the Protein Society.

[20]  G. Schneider,et al.  PocketPicker: analysis of ligand binding-sites with shape descriptors , 2007, Chemistry Central Journal.

[21]  W. Kauzmann Some factors in the interpretation of protein denaturation. , 1959, Advances in protein chemistry.

[22]  Shuichi Hirono,et al.  Evaluation of the searching abilities of HBOP and HBSITE for binding pocket detection , 2009, J. Comput. Chem..

[23]  Bingding Huang,et al.  MetaPocket: a meta approach to improve protein ligand binding site prediction. , 2009, Omics : a journal of integrative biology.

[24]  Laurence Lins,et al.  Analysis of accessible surface of residues in proteins , 2003, Protein science : a publication of the Protein Society.

[25]  Artur Baumgärtner,et al.  Shapes of flexible vesicles at constant volume , 1993 .

[26]  S. Rackovsky,et al.  Empirical Studies of Hydrophobicity. 1. Effect of Protein Size on the Hydrophobic Behavior of Amino Acids , 1980 .

[27]  Ying Wei,et al.  Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties , 2009, PLoS Comput. Biol..

[28]  Pieter F. W. Stouten,et al.  Fast prediction and visualization of protein binding pockets with PASS , 2000, J. Comput. Aided Mol. Des..

[29]  José N Onuchic,et al.  A sequence-compatible amount of native burial information is sufficient for determining the structure of small globular proteins , 2009, Proceedings of the National Academy of Sciences.

[30]  Paul Labute,et al.  Pocket Similarity: Are α Carbons Enough? , 2010, J. Chem. Inf. Model..

[31]  Description of atomic burials in compact globular proteins by Fermi‐Dirac probability distributions , 2006, Proteins.

[32]  Arteca Scaling behavior of some molecular shape descriptors of polymer chains and protein backbones. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[33]  C. DeLisi,et al.  Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. , 1987, Journal of molecular biology.

[34]  Y-h. Taguchi,et al.  Application of amino acid occurrence for discriminating different folding types of globular proteins , 2007, BMC Bioinformatics.

[35]  Tal Pupko,et al.  Structural Genomics , 2005 .

[36]  Lawrence P. Wackett,et al.  Melamine Deaminase and Atrazine Chlorohydrolase: 98 Percent Identical but Functionally Different , 2001, Journal of bacteriology.

[37]  Bin-Guang Ma,et al.  What determines protein folding type? An investigation of intrinsic structural properties and its implications for understanding folding mechanisms. , 2007, Journal of molecular biology.

[38]  Z. Weng,et al.  Integrating statistical pair potentials into protein complex prediction , 2007, Proteins.

[39]  T. Lane,et al.  Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions , 2009, PloS one.

[40]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[41]  Vladimir A. Ivanisenko,et al.  PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins , 2004, Nucleic Acids Res..

[42]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[43]  S. Rackovsky,et al.  Information‐theoretic analysis of the reference state in contact potentials used for protein structure prediction , 2010, Proteins.

[44]  Martin Zacharias,et al.  Binding site prediction and improved scoring during flexible protein–protein docking with ATTRACT , 2010, Proteins.

[45]  M. Jambon,et al.  A new bioinformatic approach to detect common 3D sites in protein structures , 2003, Proteins.

[46]  Oxana V. Galzitskaya,et al.  Compactness Determines protein Folding Type , 2008, J. Bioinform. Comput. Biol..

[47]  Salim Bougouffa,et al.  SitesIdentify: a protein functional site prediction tool , 2009, BMC Bioinformatics.

[48]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[49]  Richard M. Jackson,et al.  Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces , 2006, Bioinform..

[50]  D. Levitt,et al.  POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. , 1992, Journal of molecular graphics.

[51]  U. Bastolla,et al.  Principal eigenvector of contact matrices and hydrophobicity profiles in proteins , 2004, Proteins.

[52]  Lingle Wang,et al.  Ligand binding to protein-binding pockets with wet and dry regions , 2011, Proceedings of the National Academy of Sciences.

[53]  Andrea Passerini,et al.  Automatic prediction of catalytic residues by modeling residue structural neighborhood , 2010, BMC Bioinformatics.

[54]  Irena Roterman-Konieczna,et al.  Sequence-Structure-Function Relation Characterized in silico , 2006, Silico Biol..

[55]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[56]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[57]  M. Sanner,et al.  Reduced surface: an efficient way to compute molecular surfaces. , 1996, Biopolymers.

[58]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[59]  Irena Roterman-Konieczna,et al.  Gauss-Function-Based Model of Hydrophobicity Density in Proteins , 2006, Silico Biol..

[60]  Philip E. Bourne,et al.  A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites , 2007, BMC Bioinformatics.

[61]  A. Elcock Prediction of functionally important residues based solely on the computed energetics of protein structure. , 2001, Journal of molecular biology.

[62]  Andrew J. Bordner,et al.  Predicting small ligand binding sites in proteins using backbone structure , 2008, Bioinform..

[63]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[64]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[65]  Oliviero Carugo,et al.  Atom depth in protein structure and function. , 2003, Trends in biochemical sciences.

[66]  M. Sternberg,et al.  Modelling protein docking using shape complementarity, electrostatics and biochemical information. , 1997, Journal of molecular biology.

[67]  Milton T. W. Hearn,et al.  Physicochemical Basis of Amino Acid Hydrophobicity Scales: Evaluation of Four New Scales of Amino Acid Hydrophobicity Coefficients Derived from RP-HPLC of Peptides , 1995 .

[68]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[69]  B. Berne,et al.  Competition of electrostatic and hydrophobic interactions between small hydrophobes and model enclosures. , 2010, The journal of physical chemistry. B.

[70]  Benjamin A. Shoemaker,et al.  Knowledge-based annotation of small molecule binding sites in proteins , 2010, BMC Bioinformatics.

[71]  K Nishikawa,et al.  Correlation of the amino acid composition of a protein to its structural and biological characters. , 1982, Journal of biochemistry.

[72]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[73]  Zheng Yuan,et al.  Flexibility analysis of enzyme active sites by crystallographic temperature factors. , 2003, Protein engineering.

[74]  A Godzik,et al.  Conservation of residue interactions in a family of Ca-binding proteins. , 1989, Protein engineering.

[75]  H Naderi-Manesh,et al.  Prediction of protein surface accessibility with information theory. , 2000, Proteins.

[76]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[77]  M. Eisenstein,et al.  Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces. , 2005, Journal of molecular biology.

[78]  Irena Roterman,et al.  Localization of ligand binding site in proteins identified in silico , 2007, Journal of molecular modeling.

[79]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[80]  Tianyun Liu,et al.  Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues , 2010, BMC Structural Biology.

[81]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[82]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Saraswathi Vishveshwara,et al.  Amino acid interaction preferences in proteins , 2010, Protein science : a publication of the Protein Society.

[84]  S. J. Campbell,et al.  Ligand binding: functional site location, similarity and docking. , 2003, Current opinion in structural biology.

[85]  X. Daura,et al.  Assessing the structural conservation of protein pockets to study functional and allosteric sites: implications for drug discovery , 2010, BMC Structural Biology.

[86]  Zhiping Weng,et al.  ZRANK: Reranking protein docking predictions with an optimized energy function , 2007, Proteins.

[87]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[88]  R. Altman,et al.  Characterizing the microenvironment surrounding protein sites , 1995, Protein science : a publication of the Protein Society.

[89]  W. Goddard,et al.  Computational modeling of structure-function of g protein-coupled receptors with applications for drug design. , 2010, Current medicinal chemistry.

[90]  I M Klotz,et al.  Comparison of molecular structures of proteins: helix content; distribution of apolar residues. , 1970, Archives of biochemistry and biophysics.

[91]  Lin Li,et al.  ASPDock: protein-protein docking algorithm using atomic solvation parameters model , 2011, BMC Bioinformatics.

[92]  S Rackovsky,et al.  Global characteristics of protein sequences and their implications , 2010, Proceedings of the National Academy of Sciences.

[93]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.