Predicting lysine phosphoglycerylation with fuzzy SVM by incorporating k-spaced amino acid pairs into Chou׳s general PseAAC.

As a new type of post-translational modification, lysine phosphoglycerylation plays a key role in regulating glycolytic process and metabolism in cells. Due to the traditional experimental methods are time-consuming and labor-intensive, it is important to develop computational methods to identify the potential phosphoglycerylation sites. However, the prediction performance of the existing phosphoglycerylation site predictor is not satisfactory. In this study, a novel predictor named CKSAAP_PhoglySite is developed to predict phosphoglycerylation sites by using composition of k-spaced amino acid pairs and fuzzy support vector machine. On the one hand, after many aspects of assessments, we find the composition of k-spaced amino acid pairs is more suitable for representing the protein sequence around the phosphoglycerylation sites than other encoding schemes. On the other hand, the proposed fuzzy support vector machine algorithm can effectively handle the imbalanced and noisy problem in phosphoglycerylation sites training dataset. Experimental results indicate that CKSAAP_PhoglySite outperforms the existing phosphoglycerylation site predictor Phogly-PseAAC significantly. A matlab software package for CKSAAP_PhoglySite can be freely downloaded from https://github.com/juzhe1120/Matlab_Software/blob/master/CKSAAP_PhoglySite_Matlab_Software.zip.

[1]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[2]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[3]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[4]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[5]  K. Chou,et al.  iGPCR-Drug: A Web Server for Predicting Interaction between GPCRs and Drugs in Cellular Networking , 2013, PloS one.

[6]  Hong Gu,et al.  iLM-2L: A two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[7]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[8]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[11]  K. Chou,et al.  iHyd-PseAAC: Predicting Hydroxyproline and Hydroxylysine in Proteins by Incorporating Dipeptide Position-Specific Propensity into Pseudo Amino Acid Composition , 2014, International journal of molecular sciences.

[12]  Kuo-Chen Chou,et al.  GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. , 2011, Molecular bioSystems.

[13]  Jacques Lapointe,et al.  Theoretical and experimental biology in one—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers , 2013 .

[14]  K. Chou,et al.  iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach , 2014, BioMed research international.

[15]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[16]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[17]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[18]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[19]  K. Chou,et al.  iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model , 2015, Journal of biomolecular structure & dynamics.

[20]  Yan Xu,et al.  Phogly-PseAAC: Prediction of lysine phosphoglycerylation in proteins incorporating with position-specific propensity. , 2015, Journal of theoretical biology.

[21]  Yan Xu,et al.  Prediction of posttranslational modification sites from amino acid sequences with kernel methods. , 2014, Journal of theoretical biology.

[22]  Ling-Yun Wu,et al.  iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity , 2015, Scientific Reports.

[23]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[24]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[25]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[26]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[27]  B. Cravatt,et al.  Functional Lysine Modification by an Intrinsically Reactive Primary Glycolytic Metabolite , 2013, Science.

[28]  K. Chou,et al.  iNitro-Tyr: Prediction of Nitrotyrosine Sites in Proteins with General Pseudo Amino Acid Composition , 2014, PloS one.

[29]  Yu Xue,et al.  CPLM: a database of protein lysine modifications , 2013, Nucleic Acids Res..

[30]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[31]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[32]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[33]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[34]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[35]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[36]  Vasile Palade,et al.  Class Imbalance Learning Methods for Support Vector Machines , 2013 .

[37]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[38]  Kuo-Chen Chou,et al.  NR-2L: A Two-Level Predictor for Identifying Nuclear Receptor Subfamilies Based on Sequence-Derived Features , 2011, PloS one.

[39]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[40]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[41]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[42]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[43]  Kuo-Chen Chou,et al.  Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. , 2007, Protein and peptide letters.

[44]  D. Figeys,et al.  The functional diversity of protein lysine methylation , 2014, Molecular systems biology.

[45]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[46]  K. Chou,et al.  Recent Progress in Predicting Posttranslational Modification Sites in Proteins. , 2015, Current topics in medicinal chemistry.