Feature amplified voting algorithm for functional analysis of protein superfamily

BackgroundIdentifying the regions associated with protein function is a singularly important task in the post-genomic era. Biological studies often identify functional enzyme residues by amino acid sequences, particularly when related structural information is unavailable. In some cases of protein superfamilies, functional residues are difficult to detect by current alignment tools or evolutionary strategies when phylogenetic relationships do not parallel their protein functions. The solution proposed in this study is Feature Amplified Voting Algorithm with Three-profile alignment (FAVAT). The core concept of FAVAT is to reveal the desired features of a target enzyme or protein by voting on three different property groups aligned by three-profile alignment method. Functional residues of a target protein can then be retrieved by FAVAT analysis. In this study, the amidohydrolase superfamily was an interesting case for verifying the proposed approach because it contains divergent enzymes and proteins.ResultsThe FAVAT was used to identify critical residues of mammalian imidase, a member of the amidohydrolase superfamily. Members of this superfamily were first classified by their functional properties and sources of original organisms. After FAVAT analysis, candidate residues were identified and compared to a bacterial hydantoinase in which the crystal structure (1GKQ) has been fully elucidated. One modified lysine, three histidines and one aspartate were found to participate in the coordination of metal ions in the active site. The FAVAT analysis also redressed the misrecognition of metal coordinator Asp57 by the multiple sequence alignment (MSA) method. Several other amino acid residues known to be related to the function or structure of mammalian imidase were also identified.ConclusionsThe FAVAT is shown to predict functionally important amino acids in amidohydrolase superfamily. This strategy effectively identifies functionally important residues by analyzing the discrepancy between the sequence and functional properties of related proteins in a superfamily, and it should be applicable to other protein families.

[1]  S. Strittmatter,et al.  Brain CRMP Forms Heterotetramers Similar to Liver Dihydropyrimidinase , 1997, Journal of neurochemistry.

[2]  J. Gerlt,et al.  Evolution of function in (beta/alpha)8-barrel enzymes. , 2003, Current opinion in chemical biology.

[3]  H. Kim,et al.  C-terminal regions of D-hydantoinases are nonessential for catalysis, but affect the oligomeric structure. , 1998, Biochemical and biophysical research communications.

[4]  H M Holden,et al.  Molecular structure of dihydroorotase: a paradigm for catalysis through the use of a binuclear metal center. , 2001, Biochemistry.

[5]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[6]  Amanda Clare,et al.  The utility of different representations of protein sequence for predicting functional class , 2001, Bioinform..

[7]  Eric D Green,et al.  Differences between pair-wise and multi-sequence alignment methods affect vertebrate genome comparisons. , 2006, Trends in genetics : TIG.

[8]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[9]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[10]  T. Blundell,et al.  Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. , 2000, Protein engineering.

[11]  M. Nonaka,et al.  A novel gene family defined by human dihydropyrimidinase and three related proteins with differential tissue distribution. , 1996, Gene.

[12]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[13]  Rahul C. Deo,et al.  Structural bases for CRMP function in plexin‐dependent semaphorin3A signaling , 2004, The EMBO journal.

[14]  Yuh-Ju Sun,et al.  Crystallization and preliminary X-ray diffraction analysis of thermophilic imidase from pig liver. , 2003, Acta crystallographica. Section D, Biological crystallography.

[15]  Dietmar Schomburg,et al.  X-ray structure of a dihydropyrimidinase from Thermus sp. at 1.3 A resolution. , 2002, Journal of molecular biology.

[16]  C Sander,et al.  An evolutionary treasure: unification of a broad set of amidohydrolases related to urease , 1997, Proteins.

[17]  I. Grigoriev,et al.  Detection of protein fold similarity based on correlation of amino acid properties. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Yuh-Shyong Yang,et al.  Discovery of a novel N-iminylamidase activity: substrate specificity, chemicoselectivity and catalytic mechanism. , 2005, Protein expression and purification.

[19]  T M Su,et al.  Identification, purification, and characterization of a thermophilic imidase from pig liver. , 2000, Protein expression and purification.

[20]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[21]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[22]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[23]  John Alan Gerlt,et al.  Evolution of function in (β/α)8-barrel enzymes , 2003 .

[24]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.

[25]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[26]  O. Lichtarge,et al.  Evolutionary Trace of G Protein-coupled Receptors Reveals Clusters of Residues That Determine Global and Class-specific Functions* , 2004, Journal of Biological Chemistry.

[27]  W. Murphy,et al.  Resolution of the Early Placental Mammal Radiation Using Bayesian Phylogenetics , 2001, Science.

[28]  Xiaoqiu Huang Alignment of three sequences in quadratic space , 1993, SIAP.

[29]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[30]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[31]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[32]  P. Bork,et al.  Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways. , 2000, Journal of molecular biology.

[33]  Yuh-Shyong Yang,et al.  The role of metal on imide hydrolysis: metal content and pH profiles of metal ion-replaced mammalian imidase. , 2002, Biochemical and biophysical research communications.

[34]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[35]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[36]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Jian Zhang,et al.  The Protein Information Resource: an integrated public resource of functional annotation of proteins , 2002, Nucleic Acids Res..

[38]  Inna Dubchak,et al.  Automated whole-genome multiple alignment of rat, mouse, and human. , 2004, Genome research.

[39]  W B Jakoby,et al.  Rat liver imidase. , 1993, The Journal of biological chemistry.

[40]  M. Kanehisa,et al.  Construction and analysis of a profile library characterizing groups of structurally known proteins , 1996, Protein science : a publication of the Protein Society.

[41]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[42]  C. Syldatk,et al.  Microbial hydantoinases – industrial enzymes from the origin of life? , 1999, Applied Microbiology and Biotechnology.

[43]  Olivier Lichtarge,et al.  Evolutionary traces of functional surfaces along G protein signaling pathway. , 2002, Methods in enzymology.

[44]  Patricia C. Babbitt,et al.  Understanding Enzyme Superfamilies , 1997, The Journal of Biological Chemistry.

[45]  Francesca Chiaromonte,et al.  Regulatory potential scores from genome-wide three-way alignments of human, mouse, and rat. , 2004, Genome research.

[46]  Fumio Nakamura,et al.  Collapsin-induced growth cone collapse mediated by an intracellular protein related to UNC-33 , 1995, Nature.

[47]  S. Grisolía,et al.  The purification and properties of hydropyrimidine hydrase. , 1957, The Journal of biological chemistry.

[48]  F E Cohen,et al.  Identification of functional surfaces of the zinc binding domains of intracellular receptors. , 1997, Journal of molecular biology.

[49]  C. Syldatk,et al.  Hydantoinases and related enzymes as biocatalysts for the synthesis of unnatural chiral amino acids. , 2001, Current opinion in biotechnology.

[50]  N. K. Williams,et al.  Catalysis by hamster dihydroorotase: zinc binding, site-directed mutagenesis, and interaction with inhibitors. , 1995, Biochemistry.

[51]  Graziano Pesole,et al.  Congruent mammalian trees from mitochondrial and nuclear genes using Bayesian methods. , 2003, Molecular biology and evolution.

[52]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.