iProtGly‐SS: Identifying protein glycation sites using sequence and structure based features

Glycation is chemical reaction by which sugar molecule bonds with a protein without the help of enzymes. This is often cause to many diseases and therefore the knowledge about glycation is very important. In this paper, we present iProtGly‐SS, a protein lysine glycation site identification method based on features extracted from sequence and secondary structural information. In the experiments, we found the best feature groups combination: Amino Acid Composition, Secondary Structure Motifs, and Polarity. We used support vector machine classifier to train our model and used an optimal set of features using a group based forward feature selection technique. On standard benchmark datasets, our method is able to significantly outperform existing methods for glycation prediction. A web server for iProtGly‐SS is implemented and publicly available to use: http://brl.uiu.ac.bd/iprotgly-ss/.

[1]  Jihong Wang,et al.  Machine Learning Enables Accurate Prediction of Asparagine Deamidation Probability and Rate , 2019, Molecular therapy. Methods & clinical development.

[2]  T. Tsunoda,et al.  SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure , 2018, Molecules.

[3]  Abdollah Dehzangi,et al.  EvoStruct-Sub: An accurate Gram-positive protein subcellular localization predictor using evolutionary and structural features. , 2018, Journal of theoretical biology.

[4]  Abdollah Dehzangi,et al.  HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features , 2017, BioMed research international.

[5]  Shahana Yasmin Chowdhury,et al.  iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features , 2017, Scientific Reports.

[6]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[7]  Abdollah Dehzangi,et al.  iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting , 2017, Scientific Reports.

[8]  T. Tsunoda,et al.  SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. , 2017, Analytical biochemistry.

[9]  M. dal Peraro,et al.  Protein post-translational modifications: In silico prediction tools and molecular modeling , 2017, Computational and structural biotechnology journal.

[10]  F. Zhou,et al.  Gly-PseAAC: Identifying protein lysine glycation through sequences. , 2017, Gene.

[11]  Kuldip K. Paliwal,et al.  Sixty-five years of the long march in protein secondary structure prediction: the final stretch? , 2016, Briefings Bioinform..

[12]  Hong-Bin Shen,et al.  TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine , 2016, IEEE Transactions on NanoBioscience.

[13]  Hamid D. Ismail,et al.  RF-Hydroxysite: a random forest based predictor for hydroxylation sites. , 2016, Molecular bioSystems.

[14]  Wei Zheng,et al.  Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set , 2016, PloS one.

[15]  Alan Wee-Chung Liew,et al.  Sequence‐based prediction of protein–peptide binding sites using support vector machine , 2016, J. Comput. Chem..

[16]  Kuldip K. Paliwal,et al.  Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins , 2016, Bioinform..

[17]  D. Lio,et al.  Are Endothelial Progenitor Cells the Real Solution for Cardiovascular Diseases? Focus on Controversies and Perspectives , 2015, BioMed research international.

[18]  K. Chou,et al.  iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model , 2015, Journal of biomolecular structure & dynamics.

[19]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[20]  Yan Liu,et al.  Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods , 2015, BioMed research international.

[21]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[22]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[23]  K. Chou,et al.  iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach , 2014, BioMed research international.

[24]  N. Volkova,et al.  Preparation and physicochemical characteristics of cryogel based on gelatin and oxidised dextran , 2014, Journal of Materials Science.

[25]  Yan Xu,et al.  Prediction of posttranslational modification sites from amino acid sequences with kernel methods. , 2014, Journal of theoretical biology.

[26]  Chaochun Wei,et al.  LAceP: Lysine Acetylation Site Prediction Using Logistic Regression Classifiers , 2014, PloS one.

[27]  Nirmal Singh,et al.  Advanced Glycation End Products and Diabetic Complications , 2014, The Korean journal of physiology & pharmacology : official journal of the Korean Physiological Society and the Korean Society of Pharmacology.

[28]  Yu Xue,et al.  CPLM: a database of protein lysine modifications , 2013, Nucleic Acids Res..

[29]  Kuldip K. Paliwal,et al.  A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition , 2013, BMC Bioinformatics.

[30]  Chris Oostenbrink,et al.  A Systematic Framework for Molecular Dynamics Simulations of Protein Post-Translational Modifications , 2013, PLoS Comput. Biol..

[31]  Abdollah Dehzangi,et al.  A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[33]  Yixue Li,et al.  Prediction of lysine ubiquitination with mRMR feature selection and analysis , 2012, Amino Acids.

[34]  Yu Xue,et al.  Computational Prediction of Post-Translational Modification Sites in Proteins , 2011 .

[35]  Christodoulos A. Floudas,et al.  Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database , 2011, Scientific reports.

[36]  Pedro Domingues,et al.  Glycation and oxidation of histones H2B and H1: in vitro study and characterization by mass spectrometry , 2011, Analytical and bioanalytical chemistry.

[37]  M. Mann,et al.  Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functions , 2009, Science.

[38]  Koenraad Van Leemput,et al.  Prediction of kinase-specific phosphorylation sites using conditional random fields , 2008, Bioinform..

[39]  Yu Xue,et al.  GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy *S , 2008, Molecular & Cellular Proteomics.

[40]  Vasant Honavar,et al.  Glycosylation site prediction using ensembles of Support Vector Machine classifiers , 2007, BMC Bioinformatics.

[41]  Yong-Zi Chen,et al.  GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. , 2007, Protein engineering, design & selection : PEDS.

[42]  S. Brunak,et al.  Analysis and prediction of mammalian protein glycation. , 2006, Glycobiology.

[43]  Hsien-Da Huang,et al.  dbPTM: an information repository of protein post-translational modification , 2005, Nucleic Acids Res..

[44]  Sylvie Garneau-Tsodikova,et al.  Protein posttranslational modifications: the chemistry of proteome diversifications. , 2005, Angewandte Chemie.

[45]  H. Vlassara Advanced Glycation in Health and Disease: Role of the Modern Environment , 2005, Annals of the New York Academy of Sciences.

[46]  F. Liu,et al.  Post-translational modifications of tau protein in Alzheimer’s disease , 2005, Journal of Neural Transmission.

[47]  Herbert Waldmann,et al.  An Acylation Cycle Regulates Localization and Activity of Palmitoylated Ras Isoforms , 2005, Science.

[48]  M. Cooper,et al.  Importance of advanced glycation end products in diabetes-associated cardiovascular and renal disease. , 2004, American journal of hypertension.

[49]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[50]  Anne Dawnay,et al.  Quantitative screening of advanced glycation endproducts in cellular and extracellular proteins by tandem mass spectrometry. , 2003, The Biochemical journal.

[51]  Imre Blank,et al.  Food chemistry: Acrylamide from Maillard reaction products , 2002, Nature.

[52]  R. Castellani,et al.  Active glycation in neurofibrillary pathology of Alzheimer disease: Nε-(Carboxymethyl) lysine and hexitol-lysine , 2001 .

[53]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[54]  Huan Liu,et al.  Incremental Feature Selection , 1998, Applied Intelligence.

[55]  P Riederer,et al.  Advanced glycation end products in neurodegeneration: More than early markers of oxidative stress? , 1998, Annals of neurology.

[56]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[57]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[58]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[59]  Bo Yao,et al.  Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties. , 2017, Methods in molecular biology.

[60]  Dariusz Plewczynski,et al.  Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices. , 2017, Methods in molecular biology.

[61]  Yu-Dong Cai,et al.  Computational Prediction of Protein Epsilon Lysine Acetylation Sites Based on a Feature Selection Method. , 2017, Combinatorial chemistry & high throughput screening.

[62]  Birgit Eisenhaber,et al.  Prediction of posttranslational modification of proteins from their amino acid sequence. , 2010, Methods in molecular biology.

[63]  S. Srivastava,et al.  CURRENT IMPLICATIONS FOR CANCER DETECTION, PREVENTION, AND THERAPEUTICS* , 2006 .

[64]  A. Cerami,et al.  Protein glycation, diabetes, and aging. , 2001, Recent progress in hormone research.