Investigation and identification of protein γ-glutamyl carboxylation sites

BackgroundCarboxylation is a modification of glutamate (Glu) residues which occurs post-translation that is catalyzed by γ-glutamyl carboxylase in the lumen of the endoplasmic reticulum. Vitamin K is a critical co-factor in the post-translational conversion of Glu residues to γ-carboxyglutamate (Gla) residues. It has been shown that the process of carboxylation is involved in the blood clotting cascade, bone growth, and extraosseous calcification. However, studies in this field have been limited by the difficulty of experimentally studying substrate site specificity in γ-glutamyl carboxylation. In silico investigations have the potential for characterizing carboxylated sites before experiments are carried out.ResultsBecause of the importance of γ-glutamyl carboxylation in biological mechanisms, this study investigates the substrate site specificity in carboxylation sites. It considers not only the composition of amino acids that surround carboxylation sites, but also the structural characteristics of these sites, including secondary structure and solvent-accessible surface area (ASA). The explored features are used to establish a predictive model for differentiating between carboxylation sites and non-carboxylation sites. A support vector machine (SVM) is employed to establish a predictive model with various features. A five-fold cross-validation evaluation reveals that the SVM model, trained with the combined features of positional weighted matrix (PWM), amino acid composition (AAC), and ASA, yields the highest accuracy (0.892). Furthermore, an independent testing set is constructed to evaluate whether the predictive model is over-fitted to the training set.ConclusionsIndependent testing data that did not undergo the cross-validation process shows that the proposed model can differentiate between carboxylation sites and non-carboxylation sites. This investigation is the first to study carboxylation sites and to develop a system for identifying them. The proposed method is a practical means of preliminary analysis and greatly diminishes the total number of potential carboxylation sites requiring further experimental confirmation.

[1]  Liam J. McGuffin,et al.  Protein structure prediction servers at University College London , 2005, Nucleic Acids Res..

[2]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[3]  Jorng-Tzong Horng,et al.  KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites , 2005, Nucleic Acids Res..

[4]  Tzong-Yi Lee,et al.  Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences , 2011, Bioinform..

[5]  B. Furie,et al.  The γ-Carboxylation Recognition Site Is Sufficient to Direct Vitamin K-dependent Carboxylation on an Adjacent Glutamate-rich Region of Thrombin in a Propeptide-Thrombin Chimera* , 1997, The Journal of Biological Chemistry.

[6]  T. Attwood,et al.  PRINTS--a database of protein motif fingerprints. , 1994, Nucleic acids research.

[7]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[8]  Oliviero Carugo,et al.  Detailed estimation of bioinformatics prediction reliability through the Fragmented Prediction Performance Plots , 2007, BMC Bioinformatics.

[9]  P. Price,et al.  The propeptide of rat bone gamma-carboxyglutamic acid protein shares homology with other vitamin K-dependent protein precursors. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Tzong-Yi Lee,et al.  PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity , 2011, BMC Bioinformatics.

[11]  K. Berkner,et al.  Vitamin K-dependent carboxylation of the carboxylase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[13]  Hsien-Da Huang,et al.  RegPhos: a system to explore the protein kinase–substrate phosphorylation network in humans , 2010, Nucleic Acids Res..

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Reshma P Shetty,et al.  gamma -Glutamyl carboxylation: An extracellular posttranslational modification that antedates the divergence of molluscs, arthropods, and chordates. , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[17]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[18]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[19]  J. Suttie,et al.  Vitamin K-dependent carboxylase. Control of enzyme activity by the "propeptide" region of factor X. , 1987, The Journal of biological chemistry.

[20]  Jérôme Gouzy,et al.  The ProDom database of protein domain families , 1998, Nucleic Acids Res..

[21]  Tzong-Yi Lee,et al.  Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites , 2011, PloS one.

[22]  Hsien-Da Huang,et al.  N‐Ace: Using solvent accessibility and physicochemical properties to identify protein N‐acetylation sites , 2010, J. Comput. Chem..

[23]  J. Harris,et al.  Identification of two novel transmembrane gamma-carboxyglutamic acid proteins expressed broadly in fetal and adult tissues. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  D. Stafford,et al.  Cloning and expression of the cDNA for human gamma-glutamyl carboxylase. , 1991, Science.

[25]  R E Olson,et al.  Vitamin K and gamma-carboxyglutamate biosynthesis. , 1977, Vitamins and hormones.

[26]  J W Suttie Vitamin K-dependent carboxylase. , 1985, Annual review of biochemistry.

[27]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[28]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.

[29]  B. Furie,et al.  Gamma-carboxyglutamic acid , 1981, Molecular and Cellular Biochemistry.

[30]  M. Urist,et al.  Matrix Gla protein, a new gamma-carboxyglutamic acid-containing protein which is associated with the organic matrix of bone. , 1983, Biochemical and biophysical research communications.

[31]  B. Furie,et al.  Biosynthesis of prothrombin: intracellular localization of the vitamin K-dependent carboxylase and the sites of gamma-carboxylation. , 1996, Blood.

[32]  C Vermeer,et al.  Gamma-carboxyglutamate-containing proteins and the vitamin K-dependent carboxylase. , 1990, The Biochemical journal.

[33]  M. Wilkins,et al.  Surface accessibility of protein post-translational modifications. , 2007, Journal of proteome research.

[34]  Jorng-Tzong Horng,et al.  Incorporating structural characteristics for identification of protein methylation sites , 2009, J. Comput. Chem..

[35]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[36]  Shandar Ahmad,et al.  RVP-net: online prediction of real valued accessible surface area of proteins from single sequences , 2003, Bioinform..

[37]  J. Poser,et al.  Primary structure of the gamma-carboxyglutamic acid-containing protein from bovine bone. , 1976, Proceedings of the National Academy of Sciences of the United States of America.

[38]  R. Olson,et al.  Vitamin K and γ-Carboxyglutamate Biosynthesis , 1978 .

[39]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[40]  Allan Bradley,et al.  Increased bone formation in osteocalcin-deficient mice , 1996, Nature.

[41]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[42]  Hsien-Da Huang,et al.  dbPTM: an information repository of protein post-translational modification , 2005, Nucleic Acids Res..

[43]  R. Behringer,et al.  Spontaneous calcification of arteries and cartilage in mice lacking matrix GLA protein , 1997, Nature.

[44]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[45]  S H Kim,et al.  Prediction of protein folding class from amino acid composition , 1993, Proteins.

[46]  Jorng-Tzong Horng,et al.  Incorporating support vector machine for identifying protein tyrosine sulfation sites , 2009, J. Comput. Chem..

[47]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[48]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[49]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[50]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..