Prediction of Lysine Glycation PTM site in Protein using Peptide Sequence Evolution based Features

Glycation is a post-translational modification which is non-enzymatic in nature. It is closely associated with different biological functions and responsible for many diseases, for example, diabetes, renal failure etc. Identification of Glycation sites is very important in the development of drugs and the research areas but identifying manually in laboratory is laborious, costly and time consuming. Development of a computational tool will be very useful for the Glycation sites prediction with high accuracy. In our experiment, a new feature extraction technique, called peptide sequence evolution based feature representation, is introduced which gave an Accuracy of 95.94 ± 0.54%, a Sensitivity of 98.20% and a Specificity of 90.67±t.070/0 after running to-fold cross-validation five times. This result outperforms the previously developed tools BPB_GlySite, NetGlycate, PreGly and Gly_ PseAAC.

[1]  Hong Gu,et al.  Predicting lysine phosphoglycerylation with fuzzy SVM by incorporating k-spaced amino acid pairs into Chou׳s general PseAAC. , 2016, Journal of theoretical biology.

[2]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[3]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[4]  T. Lane,et al.  A Framework for Multiple Kernel Support Vector Regression and Its Applications to siRNA Efficacy Prediction , 2009, TCBB.

[5]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[6]  F. Zhou,et al.  Gly-PseAAC: Identifying protein lysine glycation through sequences. , 2017, Gene.

[7]  S. Brunak,et al.  Analysis and prediction of mammalian protein glycation. , 2006, Glycobiology.

[8]  K. Chou,et al.  iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC , 2016, Oncotarget.

[9]  Dexing Zhong,et al.  CarSPred: A Computational Tool for Predicting Carbonylation Sites of Human Proteins , 2014, PloS one.

[10]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[11]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[12]  Bruce A Kerwin,et al.  Characterization of site-specific glycation during process development of a human therapeutic monoclonal antibody. , 2011, Journal of pharmaceutical sciences.

[13]  Li Wang,et al.  Predicting lysine glycation sites using bi-profile bayes feature extraction , 2017, Comput. Biol. Chem..

[14]  Paul J Thornalley,et al.  Degradation products of proteins damaged by glycation, oxidation and nitration in clinical type 1 diabetes , 2005, Diabetologia.

[15]  Pufeng Du,et al.  Predicting multisite protein subcellular locations: progress and challenges , 2013, Expert review of proteomics.

[16]  Christopher T. Walsh,et al.  Protein Posttranslational Modifications: The Chemistry of Proteome Diversifications , 2006 .

[17]  G. Roman,et al.  The road to advanced glycation end products: a mechanistic perspective. , 2007, Current medicinal chemistry.

[18]  Dinggang Shen,et al.  An efficient radius-incorporated MKL algorithm for Alzheimer's disease prediction , 2015, Pattern Recognit..

[19]  Yan Liu,et al.  Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods , 2015, BioMed research international.

[20]  A. Lapolla,et al.  Advanced glycation end products: a highly complex set of biologically relevant compounds detected by mass spectrometry. , 2001, Journal of mass spectrometry : JMS.

[21]  Ying Ju,et al.  Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest , 2016, Scientifica.

[22]  Anne Dawnay,et al.  Profound mishandling of protein glycation degradation products in uremia and dialysis. , 2005, Journal of the American Society of Nephrology : JASN.

[23]  Kuo-Chen Chou,et al.  pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. , 2016, Journal of theoretical biology.

[24]  B. Liu,et al.  iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance , 2016, Scientific Reports.

[25]  K. Resing,et al.  Mapping protein post-translational modifications with mass spectrometry , 2007, Nature Methods.

[26]  Qibin Zhang,et al.  A perspective on the Maillard reaction and the analysis of protein glycation by mass spectrometry: probing the pathogenesis of chronic disease. , 2009, Journal of proteome research.

[27]  Paul J Thornalley,et al.  Detection of oxidized and glycated proteins in clinical samples using mass spectrometry--a user's perspective. , 2014, Biochimica et biophysica acta.

[28]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[29]  N. Deng,et al.  Prediction of sumoylation sites in proteins using linear discriminant analysis. , 2016, Gene.

[30]  K Takahashi,et al.  Immunohistochemical distribution and subcellular localization of three distinct specific molecular structures of advanced glycation end products in human tissues. , 1998, Laboratory investigation; a journal of technical methods and pathology.