Fast and automated functional classification with MED‐SuMo: An application on purine‐binding proteins

Ligand–protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED‐SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED‐SMA is an automated and fast method to classify binding sites. It is based on MED‐SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora‐A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED‐SMA approach is demonstrated as it gathers binding sites of proteins with similar structure‐activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target‐based drug design and prediction of cross‐reactivity and therefore potential toxic side effects.

[1]  G. Klebe,et al.  A new method to detect related function among proteins independent of sequence and fold homology. , 2002, Journal of molecular biology.

[2]  B. Honig,et al.  Structural genomics: Computational methods for structure analysis , 2003, Protein science : a publication of the Protein Society.

[3]  Peter Walter,et al.  X-ray Structures of the Signal Recognition Particle Receptor Reveal Targeting Cycle Intermediates , 2007, PloS one.

[4]  Akira R. Kinjo,et al.  Protein structure databases with new web services for structural biology and biomedical research , 2008, Briefings Bioinform..

[5]  Bohdan Waszkowycz,et al.  Towards improving compound selection in structure-based virtual screening. , 2008, Drug discovery today.

[6]  Haruki Nakamura,et al.  Remediation of the protein data bank archive , 2007, Nucleic Acids Res..

[7]  Shoshana J. Wodak,et al.  Relating destabilizing regions to known functional sites in proteins , 2007, BMC Bioinformatics.

[8]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[9]  S. Lutz,et al.  Systematic exploration of active site mutations on human deoxycytidine kinase substrate specificity. , 2008, Biochemistry.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  Thierry Langer,et al.  Virtual screening for the discovery of bioactive natural products , 2008, Progress in drug research. Fortschritte der Arzneimittelforschung. Progres des recherches pharmaceutiques.

[12]  G. Klebe,et al.  Unexpected nanomolar inhibition of carbonic anhydrase by COX-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. , 2004, Journal of medicinal chemistry.

[13]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[14]  Alexandre G. de Brevern,et al.  Computational fragment-based drug design to explore the hydrophobic sub-pocket of the mitotic kinesin Eg5 allosteric binding site , 2009, J. Comput. Aided Mol. Des..

[15]  S. Kazmirski,et al.  Structural analysis of the inactive state of the Escherichia coli DNA polymerase clamp-loader complex. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Alexandre G. de Brevern Editorial [Hot Topic: In Silico (Guest Editor: Alexandre G. de Brevern)] , 2009 .

[17]  Ruth Nussinov,et al.  SiteEngines: recognition and comparison of binding sites and protein–protein interfaces , 2005, Nucleic Acids Res..

[18]  O. Issinger,et al.  Inclining the purine base binding plane in protein kinase CK2 by exchanging the flanking side-chains generates a preference for ATP as a cosubstrate. , 2005, Journal of molecular biology.

[19]  J M Thornton,et al.  Three-dimensional structure analysis of PROSITE patterns. , 1999, Journal of molecular biology.

[20]  Hong-yu Zhang,et al.  Characters of very ancient proteins. , 2008, Biochemical and biophysical research communications.

[21]  Christophe Combet,et al.  The SuMo server: 3D search for protein functional sites , 2005, Bioinform..

[22]  Janet M. Thornton,et al.  Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons , 2005, Bioinform..

[23]  Giovanni De Micheli,et al.  Clustering protein environments for function prediction: finding PROSITE motifs in 3D , 2007, BMC Bioinformatics.

[24]  G. Klebe,et al.  Identification and mapping of small-molecule binding sites in proteins: computational tools for structure-based drug design. , 2002, Farmaco.

[25]  Li-Wei Hung,et al.  Crystal structure of the ATP-binding subunit of an ABC transporter , 1998, Nature.

[26]  Jayaraman Chandrasekhar,et al.  Strategies and tactics for optimizing the Hit-to-Lead process and beyond--a computational chemistry perspective. , 2008, Drug discovery today.

[27]  Gabriele Cruciani,et al.  A Common Reference Framework for Analyzing/Comparing Proteins and Ligands. Fingerprints for Ligands And Proteins (FLAP): Theory and Application , 2007, J. Chem. Inf. Model..

[28]  R. Powers,et al.  Comparison of protein active site structures for functional annotation of proteins and drug design , 2006, Proteins.

[29]  Kengo Kinoshita,et al.  eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape , 2007, Nucleic Acids Res..

[30]  Olivier Lichtarge,et al.  Evolutionary trace report_maker: a new type of service for comparative analysis of proteins , 2006, Bioinform..

[31]  Michael J. Eck,et al.  Three-dimensional structure of the tyrosine kinase c-Src , 1997, Nature.

[32]  Gail J. Bartlett,et al.  Effective function annotation through catalytic residue conservation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Lydia E. Kavraki,et al.  Prediction of enzyme function based on 3D templates of evolutionarily important amino acids , 2008, BMC Bioinformatics.

[34]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[35]  Lutz Schmitt,et al.  A structural analysis of asymmetry required for catalytic activity of an ABC‐ATPase domain dimer , 2006, The EMBO journal.

[36]  Dariya S. Glazer,et al.  The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications , 2008, BMC Genomics.

[37]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[38]  Wenqing,et al.  Three-dimensional structure of the tyrosine kinase cSrc , 2022 .

[39]  Eyke Hüllermeier,et al.  Functional Classification of Protein Kinase Binding Sites Using Cavbase , 2007, ChemMedChem.

[40]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[41]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[42]  Eytan Domany,et al.  Automated assignment of SCOP and CATH protein structure classifications from FSSP scores , 2002, Proteins.

[43]  S. Shaikh,et al.  From drug target to leads--sketching a physicochemical pathway for lead molecule design in silico. , 2007, Current pharmaceutical design.

[44]  Yanli Wang,et al.  Molecular determinants for ATP-binding in proteins: a data mining and quantum chemical analysis. , 2004, Journal of molecular biology.

[45]  Leon Goldovsky,et al.  BioLayout(Java): versatile network visualisation of structural and functional relationships. , 2005, Applied bioinformatics.

[46]  Dietmar Schomburg,et al.  GTP plus water mimic ATP in the active site of protein kinase CK2 , 1999, Nature Structural Biology.

[47]  Russ B Altman,et al.  The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation , 2008, Genome Biology.

[48]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[49]  Stephen H Hughes,et al.  High-resolution structures of HIV-1 reverse transcriptase/TMC278 complexes: Strategic flexibility explains potency against resistance mutations , 2008, Proceedings of the National Academy of Sciences.

[50]  M. Vieth,et al.  Kinomics-structural biology and chemogenomics of kinase inhibitors and targets. , 2004, Biochimica et biophysica acta.

[51]  M. Inouye,et al.  GHKL, an emergent ATPase/kinase superfamily. , 2000, Trends in biochemical sciences.

[52]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[53]  S. Dongen Graph clustering by flow simulation , 2000 .

[54]  Emily R Jefferson,et al.  A comparison of SCOP and CATH with respect to domain–domain interactions , 2007, Proteins.

[55]  Leszek Konieczny,et al.  Conformational subspace in simulation of early‐stage protein folding , 2004, Proteins.

[56]  J. M. Sauder,et al.  Structure of nucleotide‐binding domain 1 of the cystic fibrosis transmembrane conductance regulator , 2004, The EMBO journal.

[57]  A. May,et al.  Structure and mechanism of ArnA: conformational change implies ordered dehydrogenase mechanism in key enzyme for polymyxin resistance. , 2005, Structure.

[58]  Amos Bairoch,et al.  ScanProsite: a reference implementation of a PROSITE scanning tool. , 2002, Applied bioinformatics.

[59]  Russell L. Marsden,et al.  Progress of structural genomics initiatives: an analysis of solved target structures. , 2005, Journal of molecular biology.

[60]  Aurélie Bornot,et al.  Functional annotation strategy for protein structures , 2007, Bioinformation.

[61]  W. Kabsch,et al.  Atomic structure of the actin: DNase I complex , 1990, Nature.

[62]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[63]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[64]  Sung-Hou Kim,et al.  Structure of O67745_AQUAE, a hypothetical protein from Aquifex aeolicus. , 2007, Acta crystallographica. Section F, Structural biology and crystallization communications.

[65]  Nick V Grishin,et al.  A comprehensive update of the sequence and structure classification of kinases , 2015 .

[66]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[67]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[68]  Manfred S Weiss,et al.  Structures and diseases , 2008, Nature Structural &Molecular Biology.

[69]  Irena Roterman-Konieczna,et al.  Prediction of Functional Sites Based on the Fuzzy Oil Drop Model , 2007, PLoS Comput. Biol..

[70]  Janet M. Thornton,et al.  From protein structure to biochemical function? , 2004, Journal of Structural and Functional Genomics.

[71]  G. Klebe,et al.  From the Similarity Analysis of Protein Cavities to the Functional Classification of Protein Families Using Cavbase , 2006, Journal of Molecular Biology.

[72]  Josef Pánek,et al.  A new method for identification of protein (sub)families in a set of proteins based on hydropathy distribution in proteins , 2005, Proteins.

[73]  E. Newsholme,et al.  The contents of adenine nucleotides, phosphagens and some glycolytic intermediates in resting muscles from vertebrates and invertebrates. , 1975, The Biochemical journal.

[74]  D. Maskell,et al.  Toward a structural understanding of the dehydratase mechanism. , 2002, Structure.

[75]  Ariel Fernández,et al.  Induced disorder in protein-ligand complexes as a drug-design strategy. , 2008, Molecular pharmaceutics.

[76]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[77]  M. Jambon,et al.  A new bioinformatic approach to detect common 3D sites in protein structures , 2003, Proteins.

[78]  Amos Bairoch,et al.  Swiss-Prot: Juggling between evolution and stability , 2004, Briefings Bioinform..

[79]  L. Johnson,et al.  Effects of Phosphorylation of Threonine 160 on Cyclin-dependent Kinase 2 Structure and Activity* , 1999, The Journal of Biological Chemistry.

[80]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[81]  Nathalie Ulryck,et al.  Structure of the human multidrug resistance protein 1 nucleotide binding domain 1 bound to Mg2+/ATP reveals a non-productive catalytic site. , 2006, Journal of molecular biology.

[82]  Irena Roterman-Konieczna,et al.  Gauss-Function-Based Model of Hydrophobicity Density in Proteins , 2006, Silico Biol..

[83]  Rachelle Gaudet,et al.  Structure of the ABC ATPase domain of human TAP1, the transporter associated with antigen processing , 2001, The EMBO journal.

[84]  R. Altman,et al.  Recognizing protein binding sites using statistical descriptions of their 3D environments. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[85]  Arvin C. Dar,et al.  Higher-Order Substrate Recognition of eIF2α by the RNA-Dependent Protein Kinase PKR , 2005, Cell.

[86]  de Brevern Ag New opportunities to fight against infectious diseases and to identify pertinent drug targets with novel methodologies. , 2009 .

[87]  G. Oliva,et al.  Virtual screening and its integration with modern drug design technologies. , 2008, Current medicinal chemistry.

[88]  Duncan P. Brown,et al.  Automated Protein Subfamily Identification and Classification , 2007, PLoS Comput. Biol..

[89]  F. Quiocho,et al.  A tweezers-like motion of the ATP-binding cassette dimer in an ABC transport cycle. , 2003, Molecular cell.

[90]  T. Kornberg,et al.  Deoxyribonucleic acid synthesis in cell-free extracts. IV. Purification and catalytic properties of deoxyribonucleic acid polymerase III. , 1972, The Journal of biological chemistry.

[91]  Olivier Lichtarge,et al.  ET viewer: an application for predicting and visualizing functional sites in protein structures , 2006, Bioinform..

[92]  Amos Bairoch,et al.  ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins , 2006, Nucleic Acids Res..

[93]  Ksenia Oguievetskaia,et al.  Computational Fragment-Based Approach at PDB Scale by Protein Local Similarity , 2009, J. Chem. Inf. Model..

[94]  Alexandre G de Brevern,et al.  Analysis of HSP90-related folds with MED-SuMo classification approach , 2009, Drug design, development and therapy.

[95]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[96]  Ruth Nussinov,et al.  Generation and analysis of a protein–protein interface data set with similar chemical and spatial patterns of interactions , 2005, Proteins.

[97]  Narayanaswamy Srinivasan,et al.  KinG: a database of protein kinases in genomes , 2004, Nucleic Acids Res..

[98]  Adam Godzik,et al.  JAFA: a protein function annotation meta-server , 2006, Nucleic Acids Res..

[99]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[100]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[101]  A. Mulichak,et al.  Crystal structure of a tetrameric GDP‐d‐mannose 4,6‐dehydratase from a bacterial GDP‐d‐rhamnose biosynthetic pathway , 2004, Protein science : a publication of the Protein Society.

[102]  Dong Young Kim,et al.  Crystal Structure of ClpX Molecular Chaperone from Helicobacter pylori* , 2003, Journal of Biological Chemistry.

[103]  Leszek Konieczny,et al.  Hydrophobic collapse in late-stage folding (in silico) of bovine pancreatic trypsin inhibitor. , 2006, Biochimie.

[104]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[105]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[106]  Nick V Grishin,et al.  Sequence and structure classification of kinases. , 2002, Journal of molecular biology.

[107]  Leszek Konieczny,et al.  Ligation site in proteins recognized in silico , 2006, Bioinformation.

[108]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[109]  Jean-Christophe Nebel,et al.  Automatic generation of 3D motifs for classification of protein binding sites , 2007, BMC Bioinformatics.

[110]  John Kuriyan,et al.  Structural analysis of a eukaryotic sliding DNA clamp–clamp loader complex , 2004, Nature.

[111]  Anton J. Enright,et al.  BioLayout-an automatic graph layout algorithm for similarity visualization , 2001, Bioinform..

[112]  Alexandre G de Brevern,et al.  New opportunities to fight against infectious diseases and to identify pertinent drug targets with novel methodologies. , 2009, Infectious disorders drug targets.

[113]  T. Yokota,et al.  Structural insight into nucleotide recognition in tau-protein kinase I/glycogen synthase kinase 3 beta. , 2004, Acta crystallographica. Section D, Biological crystallography.

[114]  O. Lichtarge,et al.  Evolutionary Trace of G Protein-coupled Receptors Reveals Clusters of Residues That Determine Global and Class-specific Functions* , 2004, Journal of Biological Chemistry.

[115]  Janet M Thornton,et al.  Towards fully automated structure-based function prediction in structural genomics: a case study. , 2007, Journal of molecular biology.

[116]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[117]  Gabriele Ausiello,et al.  Functional annotation by identification of local surface similarities: a novel tool for structural genomics , 2005, BMC Bioinformatics.

[118]  J. Bischoff,et al.  3-Amino-1,4,5,6-tetrahydropyrrolo[3,4-c]pyrazoles: a new class of CDK2 inhibitors. , 2006, Bioorganic & medicinal chemistry letters.

[119]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.