Prediction and Analysis of Post-Translational Pyruvoyl Residue Modification Sites from Internal Serines in Proteins

Most of pyruvoyl-dependent proteins observed in prokaryotes and eukaryotes are critical regulatory enzymes, which are primary targets of inhibitors for anti-cancer and anti-parasitic therapy. These proteins undergo an autocatalytic, intramolecular self-cleavage reaction in which a covalently bound pyruvoyl group is generated on a conserved serine residue. Traditional detections of the modified serine sites are performed by experimental approaches, which are often labor-intensive and time-consuming. In this study, we initiated in an attempt for the computational predictions of such serine sites with Feature Selection based on a Random Forest. Since only a small number of experimentally verified pyruvoyl-modified proteins are collected in the protein database at its current version, we only used a small dataset in this study. After removing proteins with sequence identities >60%, a non-redundant dataset was generated and was used, which contained only 46 proteins, with one pyruvoyl serine site for each protein. Several types of features were considered in our method including PSSM conservation scores, disorders, secondary structures, solvent accessibilities, amino acid factors and amino acid occurrence frequencies. As a result, a pretty good performance was achieved in our dataset. The best 100.00% accuracy and 1.0000 MCC value were obtained from the training dataset, and 93.75% accuracy and 0.8441 MCC value from the testing dataset. The optimal feature set contained 9 features. Analysis of the optimal feature set indicated the important roles of some specific features in determining the pyruvoyl-group-serine sites, which were consistent with several results of earlier experimental studies. These selected features may shed some light on the in-depth understanding of the mechanism of the post-translational self-maturation process, providing guidelines for experimental validation. Future work should be made as more pyruvoyl-modified proteins are found and the method should be evaluated on larger datasets. At last, the predicting software can be downloaded from http://www.nkbiox.com/sub/pyrupred/index.html.

[1]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[2]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[3]  S. Brunak,et al.  Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. , 2005, Glycobiology.

[4]  T. Blundell,et al.  Structure of Escherichia coli aspartate α-decarboxylase Asn72Ala: probing the role of Asn72 in pyruvoyl cofactor formation. , 2012, Acta Crystallographica. Section F : Structural Biology and Crystallization Communications.

[5]  T. J. Byers,et al.  S-adenosyl-L-methionine decarboxylase of Acanthamoeba castellanii (Neff): purification and properties. , 1993, The Biochemical journal.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Tao Zhang,et al.  Prediction of the parallel/antiparallel orientation of beta-strands using amino acid pairing preferences and support vector machines. , 2010, Journal of theoretical biology.

[8]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[9]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[10]  Jishou Ruan,et al.  The interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of beta-strands. , 2009, Biochemical and biophysical research communications.

[11]  D. Fairlie,et al.  Proteases universally recognize beta strands in their active sites. , 2005, Chemical reviews.

[12]  Ramanathan Sowdhamini,et al.  3dswap-pred: prediction of 3D domain swapping from protein sequence using Random Forest approach. , 2011, Protein and peptide letters.

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Yu-Dong Cai,et al.  Computational prediction and analysis of protein γ-carboxylation sites based on a random forest method. , 2012, Molecular bioSystems.

[15]  Acr Martin,et al.  Amino Acid Pairing Preferences in Parallel β-Sheets in Proteins , 2006 .

[16]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[17]  Yu Shyr,et al.  Improved prediction of lysine acetylation by support vector machines. , 2009, Protein and peptide letters.

[18]  H. Trip,et al.  HdcB, a novel enzyme catalysing maturation of pyruvoyl‐dependent histidine decarboxylase , 2011, Molecular microbiology.

[19]  Robert H. White,et al.  Methanococcus jannaschii Uses a Pyruvoyl-dependent Arginine Decarboxylase in Polyamine Biosynthesis* , 2002, The Journal of Biological Chemistry.

[20]  D. Fairlie,et al.  Conformational selection of inhibitors and substrates by proteolytic enzymes: implications for drug design and polypeptide processing. , 2000, Journal of medicinal chemistry.

[21]  B. L. Sibanda,et al.  Crystal structure of aspartate decarboxylase at 2.2 Å resolution provides evidence for an ester in protein self–processing , 1998, Nature Structural Biology.

[22]  Carina M C Lobley,et al.  Structural constraints on protein self‐processing in L‐aspartate‐α‐decarboxylase , 2003, The EMBO journal.

[23]  S. Ealick,et al.  Structural biology of S-adenosylmethionine decarboxylase , 2010, Amino Acids.

[24]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[25]  Steve R. Gunn,et al.  Identifying Feature Relevance Using a Random Forest , 2005, SLSFS.

[26]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  W. Dowhan,et al.  Studies on the mechanism of formation of the pyruvate prosthetic group of phosphatidylserine decarboxylase from Escherichia coli. , 1990, The Journal of biological chemistry.

[28]  A. Lonvaud-Funel Biogenic amines in wines: role of lactic acid bacteria. , 2001, FEMS microbiology letters.

[29]  C. Kinsland,et al.  Evolutionary Links as Revealed by the Structure of Thermotoga maritima S-Adenosylmethionine Decarboxylase* , 2004, Journal of Biological Chemistry.

[30]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[31]  Liwen You,et al.  Detection of cleavage sites for HIV-1 protease in native proteins. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[32]  Yu-Dong Cai,et al.  Prediction of Protein Cleavage Site with Feature Selection by Random Forest , 2012, PloS one.

[33]  Kuo-Chen Chou,et al.  Predict and analyze S-nitrosylation modification sites with the mRMR and IFS approaches. , 2012, Journal of proteomics.

[34]  D. Graham,et al.  Outer and Inner Membrane Proteins Compose an Arginine-Agmatine Exchange System in Chlamydophila pneumoniae , 2008, Journal of bacteriology.

[35]  K. Chou,et al.  Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network , 2012, PloS one.

[36]  Tao Huang,et al.  Prediction of Pharmacological and Xenobiotic Responses to Drugs Based on Time Course Gene Expression Profiles , 2009, PloS one.

[37]  L. Shantz,et al.  S-adenosylmethionine decarboxylase: structure, function and regulation by polyamines. , 1998, Biochemical Society transactions.

[38]  Sonia Longhi,et al.  A practical overview of protein disorder prediction methods , 2006, Proteins.

[39]  W. Tolbert,et al.  Structure of a human S-adenosylmethionine decarboxylase self-processing ester intermediate and mechanism of putrescine stimulation of processing as revealed by the H243A mutant. , 2001, Biochemistry.

[40]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[41]  Yu Xue,et al.  PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory , 2006, BMC Bioinformatics.

[42]  L. Marton,et al.  Targeting polyamine metabolism and function in cancer and other hyperproliferative diseases , 2007, Nature Reviews Drug Discovery.

[43]  E. Gerner,et al.  Polyamines and cancer: old molecules, new understanding , 2004, Nature Reviews Cancer.

[44]  D. Graham,et al.  Crenarchaeal Arginine Decarboxylase Evolved from an S-Adenosylmethionine Decarboxylase Enzyme* , 2008, Journal of Biological Chemistry.

[45]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[46]  M. Hackert,et al.  Pyruvoyl-dependent histidine decarboxylase. Active site structure and mechanistic analysis. , 1989, The Journal of biological chemistry.

[47]  Jonathan D. Hirst,et al.  Prediction of glycosylation sites using random forests , 2008, BMC Bioinformatics.

[48]  Shao-Chun Jia,et al.  Using random forest algorithm to predict β-hairpin motifs. , 2011, Protein and peptide letters.

[49]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[50]  Subhadip Basu,et al.  AMS 3.0: prediction of post-translational modifications , 2010, BMC Bioinformatics.

[51]  A. Pich,et al.  In vitro processing of the proproteins GrdE of protein B of glycine reductase and PrdA of D-proline reductase from Clostridium sticklandii: formation of a pyruvoyl group from a cysteine residue. , 2001, European journal of biochemistry.

[52]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[53]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[54]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[55]  A. Shirahata,et al.  Identification of the primary structure and post-translational modification of rat S-adenosylmethionine decarboxylase. , 2010, Biological & pharmaceutical bulletin.

[56]  P. Lance,et al.  Rationale for, and design of, a clinical trial targeting polyamine metabolism for colon cancer chemoprevention , 2007, Amino Acids.