Identification of subfamily-specific sites based on active sites modeling and clustering

MOTIVATION Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions. RESULTS Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases. AVAILABILITY http://www.genoscope.fr/ASMC/.

[1]  Steven E. Brenner,et al.  WebLogo: A sequence logo generator - eScholarship , 2004 .

[2]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[3]  Lydia E. Kavraki,et al.  Prediction of enzyme function based on 3D templates of evolutionarily important amino acids , 2008, BMC Bioinformatics.

[4]  Wei Cai,et al.  Prediction of functional specificity determinants from protein sequences using log-likelihood ratios , 2006, Bioinform..

[5]  Rafael Najmanovich,et al.  Detection of 3 D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008 .

[6]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[7]  W. Rutter,et al.  Converting trypsin to chymotrypsin: residue 172 is a substrate specificity determinant. , 1994, Biochemistry.

[8]  Byung-Hoon Park,et al.  In silico discovery of enzyme-substrate specificity-determining residue clusters. , 2005, Journal of molecular biology.

[9]  T. Hunter,et al.  The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. , 1988, Science.

[10]  Ian Sillitoe,et al.  FLORA: A Novel Method to Predict Protein Function from Structure in Diverse Superfamilies , 2009, PLoS Comput. Biol..

[11]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[12]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[13]  J B Hurley,et al.  Two amino acid substitutions convert a guanylyl cyclase, RetGC-1, into an adenylyl cyclase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Narayanan Eswar,et al.  Alignment of multiple protein structures based on sequence and structure features. , 2009, Protein engineering, design & selection : PEDS.

[15]  Alfonso Valencia,et al.  Protein interactions and ligand binding: From protein subfamilies to functional specificity , 2010, Proceedings of the National Academy of Sciences.

[16]  Kenji Mizuguchi,et al.  Relationships between functional subclasses and information contained in active‐site and ligand‐binding residues in diverse superfamilies , 2010, Proteins.

[17]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[18]  Sungsam Gong,et al.  Discarding Functional Residues from the Substitution Table Improves Predictions of Active Sites within Three-Dimensional Structures , 2008, PLoS Comput. Biol..

[19]  Najeeb M. Halabi,et al.  Protein Sectors: Evolutionary Units of Three-Dimensional Structure , 2009, Cell.

[20]  Jie Liang,et al.  Predicting Protein Function and Binding Profile via Matching of Local Evolutionary and Geometric Surface Patterns , 2009 .

[21]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[22]  A. Giuliani,et al.  A computational approach identifies two regions of Hepatitis C Virus E1 protein as interacting domains involved in viral fusion process , 2009, BMC Structural Biology.

[23]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[24]  Mona Singh,et al.  Characterization and prediction of residues determining protein functional specificity , 2008, Bioinform..

[25]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[26]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[27]  Janet M. Thornton,et al.  Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008, ECCB.

[28]  Eugene I. Shakhnovich,et al.  SDR: a database of predicted specificity-determining residues in proteins , 2008, Nucleic Acids Res..

[29]  Narayanan Eswar,et al.  Protein structure modeling with MODELLER. , 2008, Methods in molecular biology.

[30]  Michael Schroeder,et al.  Using structural motif descriptors for sequence-based binding site prediction , 2007, BMC Bioinformatics.

[31]  Mikhail S. Gelfand,et al.  SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins , 2004, Nucleic Acids Res..

[32]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[33]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[34]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using Modeller , 2006, Current protocols in bioinformatics.

[35]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[36]  Anna R Panchenko,et al.  Coevolution in defining the functional specificity , 2009, Proteins.

[37]  Ruth Nussinov,et al.  A method for simultaneous alignment of multiple protein structures , 2004, Proteins.

[38]  Oliver Kohlbacher,et al.  Combining Structure and Sequence Information Allows Automated Prediction of Substrate Specificities within Enzyme Families , 2010, PLoS Comput. Biol..

[39]  Dan S. Tawfik,et al.  Enzyme promiscuity: a mechanistic and evolutionary perspective. , 2010, Annual review of biochemistry.

[40]  Robert B. Russell,et al.  Combining specificity determining and conserved residues improves functional site prediction , 2009, BMC Bioinformatics.

[41]  Nir Ben-Tal,et al.  The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures , 2008, Nucleic Acids Res..

[42]  Alfonso Valencia,et al.  Phylogeny-independent detection of functional residues , 2006, Bioinform..

[43]  Anna Tramontano,et al.  Assessment of homology‐based predictions in CASP5 , 2003, Proteins.

[44]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[45]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[46]  Eugene I. Shakhnovich,et al.  Determining functional specificity from protein sequences , 2005, Bioinform..

[47]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[48]  Anna R Panchenko,et al.  Functional specificity lies within the properties and evolutionary changes of amino acids. , 2007, Journal of molecular biology.

[49]  Olivier Lichtarge,et al.  Evolutionary Trace Annotation Server: automated enzyme function prediction in protein structures using 3D templates , 2009, Bioinform..

[50]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.