Automatic generation of 3 D motifs for classification of protein binding sites

Background: Since many of the new protein structures delivered by high-throughput processes do not have any known function, there is a need for structure-based prediction of protein function. Protein 3D structures can be clustered according to their fold or secondary structures to produce classes of some functional significance. A recent alternative has been to detect specific 3D motifs which are often associated to active sites. Unfortunately, there are very few known 3D motifs, which are usually the result of a manual process, compared to the number of sequential motifs already known. In this paper, we report a method to automatically generate 3D motifs of protein structure binding sites based on consensus atom positions and evaluate it on a set of adenine based ligands. Results: Our new approach was validated by generating automatically 3D patterns for the main adenine based ligands, i.e. AMP, ADP and ATP. Out of the 18 detected patterns, only one, the ADP4 pattern, is not associated with well defined structural patterns. Moreover, most of the patterns could be classified as binding site 3D motifs. Literature research revealed that the ADP4 pattern actually corresponds to structural features which show complex evolutionary links between ligases and transferases. Therefore, all of the generated patterns prove to be meaningful. Each pattern was used to query all PDB proteins which bind either purine based or guanine based ligands, in order to evaluate the classification and annotation properties of the pattern. Overall, our 3D patterns matched 31% of proteins with adenine based ligands and 95.5% of them were classified correctly. Conclusion: A new metric has been introduced allowing the classification of proteins according to the similarity of atomic environment of binding sites, and a methodology has been developed to automatically produce 3D patterns from that classification. A study of proteins binding adenine based ligands showed that these 3D patterns are not only biochemically meaningful, but can be used for protein classification and annotation. Background Structural genomics projects aim at high-throughput delivery of protein structures regardless of the state of their functional annotation. Moreover, roughly half of genePublished: 30 August 2007 BMC Bioinformatics 2007, 8:321 doi:10.1186/1471-2105-8-321 Received: 16 February 2007 Accepted: 30 August 2007 This article is available from: http://www.biomedcentral.com/1471-2105/8/321 © 2007 Nebel et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

[1]  Jean-Christophe Nebel,et al.  Generation of 3D templates of active sites of proteins with rigid prosthetic groups , 2006, German Conference on Bioinformatics.

[2]  Jian Pei,et al.  Mining phenotypes and informative genes from gene expression data , 2003, KDD '03.

[3]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[4]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[5]  George Karypis,et al.  LUTO : A Web-Enabled Clustering Toolkit , 2003 .

[6]  Janet M. Thornton,et al.  PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids , 2004, Nucleic Acids Res..

[7]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[10]  N. Grishin,et al.  Phosphatidylinositol phosphate kinase: a link between protein kinase and glutathione synthase folds. , 1999, Journal of molecular biology.

[11]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[12]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[13]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[14]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[16]  Russ B. Altman,et al.  WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures , 2003, Nucleic Acids Res..

[17]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..

[18]  Srinivasan Parthasarathy,et al.  Effective pre-processing strategies for functional clustering of a protein-protein interactions network , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[19]  Rolf Apweiler,et al.  E-MSD: an integrated data resource for bioinformatics. , 2004, Nucleic acids research.

[20]  Jukka V. Lehtonen,et al.  Two “unrelated” families of ATP‐dependent enzymes share extensive structural similarities about their cofactor binding sites , 1998, Protein science : a publication of the Protein Society.

[21]  G. Glazko,et al.  Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns , 2004, Genome Biology.

[22]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[23]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[24]  J Kuriyan,et al.  Crystal structure of the atypical protein kinase domain of a TRP channel with phosphotransferase activity. , 2001, Molecular cell.

[25]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[26]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..

[27]  George Karypis,et al.  wCLUTO: A Web-Enabled Clustering Toolkit1 , 2003, Plant Physiology.

[28]  Philip E. Bourne,et al.  CE-MC: a multiple protein structure alignment server , 2004, Nucleic Acids Res..

[29]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[30]  M. Jambon,et al.  A new bioinformatic approach to detect common 3D sites in protein structures , 2003, Proteins.

[31]  G. Kleywegt,et al.  Interactive motif and fold recognition in protein structures , 2002 .

[32]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[33]  Robert B Russell,et al.  A model for statistical significance of local similarities in structure. , 2003, Journal of molecular biology.

[34]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..