Multi‐iPPseEvo: A Multi‐label Classifier for Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into Chou′s General PseAAC via Grey System Theory

Predicting phosphorylation protein is a challenging problem, particularly when query proteins have multi‐label features meaning that they may be phosphorylated at two or more different type amino acids. In fact, human protein usually be phosphorylated at serine, threonine and tyrosine. By introducing the “multi‐label learning” approach, a novel predictor has been developed that can be used to deal with the systems containing both single‐ and multi‐label phosphorylation protein. Here we proposed a predictor called Multi‐iPPseEvo by (1) incorporating the protein sequence evolutionary information into the general pseudo amino acid composition (PseAAC) via the grey system theory, (2) balancing out the skewed training datasets by the asymmetric bootstrap approach, and (3) constructing an ensemble predictor by fusing an array of individual random forest classifiers thru a voting system. Rigorous cross‐validations via a set of multi‐label metrics indicate that the multi‐label phosphorylation predictor is very promising and encouraging. The current approach represents a new strategy to deal with the multi‐label biological problems, and the software is freely available for academic use at http://www.jci‐bioinfo.cn/Multi‐iPPseEvo.

[1]  Mohammed Yeasin,et al.  Prediction of membrane proteins using split amino acid and ensemble classification , 2011, Amino Acids.

[2]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[4]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[5]  K. Chou,et al.  iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. , 2011, Molecular bioSystems.

[6]  K. Chou,et al.  Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins. , 2010, Journal of theoretical biology.

[7]  B. Liu,et al.  PseDNA‐Pro: DNA‐Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation , 2015, Molecular informatics.

[8]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[9]  K. Chou,et al.  Recent Progress in Predicting Posttranslational Modification Sites in Proteins. , 2015, Current topics in medicinal chemistry.

[10]  Eric I-Chao Chang,et al.  Multi‐label classification for colon cancer using histopathological images , 2013, Microscopy research and technique.

[11]  Manish Kumar,et al.  Prediction of β-lactamase and its class by Chou's pseudo-amino acid composition and support vector machine. , 2015, Journal of theoretical biology.

[12]  Zaheer Ullah Khan,et al.  Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[13]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[14]  Zhenxin Wang,et al.  Microarray-based detection of protein binding and functionality by gold nanoparticle probes. , 2005, Analytical chemistry.

[15]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[16]  K. Chou,et al.  iHyd-PseAAC: Predicting Hydroxyproline and Hydroxylysine in Proteins by Incorporating Dipeptide Position-Specific Propensity into Pseudo Amino Acid Composition , 2014, International journal of molecular sciences.

[17]  Katarzyna Stapor,et al.  Protein Fold Recognition with Combined SVM-RDA Classifier , 2010, HAIS.

[18]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[19]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[20]  M. Fussenegger,et al.  Use of antibodies for detection of phosphorylated proteins separated by two‐dimensional gel electrophoresis , 2001, Proteomics.

[21]  R. Campbell,et al.  Development of a Transcreener™ Kinase Assay for Protein Kinase A and Demonstration of Concordance of Data with a Filter-Binding Assay Format , 2007, Journal of biomolecular screening.

[22]  Dor Ben-Amotz,et al.  Detection of the site of phosphorylation in a peptide using Raman spectroscopy and partial least squares discriminant analysis. , 2005, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[23]  Y. Yoo,et al.  Determination of protein phosphorylation and the translocation of green fluorescence protein-extracellular signal-regulated kinase 2 by capillary electrophoresis using laser induced fluorescence detection. , 2004, Journal of chromatography. A.

[24]  E. P. Kennedy,et al.  The enzymatic phosphorylation of proteins. , 1954, The Journal of biological chemistry.

[25]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[26]  Hong-Bin Shen,et al.  Multi Label Learning for Prediction of Human Protein Subcellular Localizations , 2009, The protein journal.

[27]  T. Hunter,et al.  Oncogenic kinase signalling , 2001, Nature.

[28]  P. Cohen The role of protein phosphorylation in human health and disease. The Sir Hans Krebs Medal Lecture. , 2001, European journal of biochemistry.

[29]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[30]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[31]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[32]  K. Chou,et al.  Signal-3L: A 3-layer approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[33]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[34]  G. Rijksen,et al.  Determination of specific protein kinase activities using phosphorus-33. , 1996, Journal of biochemical and biophysical methods.

[35]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[36]  Kuo-Chen Chou,et al.  Predicting protein subcellular location by fusing multiple classifiers , 2006, Journal of cellular biochemistry.

[37]  T. Frączyk,et al.  Phosphorylation of basic amino acid residues in proteins: important but easily missed. , 2011, Acta biochimica Polonica.

[38]  Kuo-Chen Chou,et al.  QuatIdent: a web server for identifying protein quaternary structural attribute by fusing functional domain and sequential evolution information. , 2009, Journal of proteome research.

[39]  K. Chou,et al.  Virus-mPLoc: A Fusion Classifier for Viral Protein Subcellular Location Prediction by Incorporating Multiple Sites , 2010, Journal of biomolecular structure & dynamics.

[40]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[41]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[42]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[43]  Geoff Holmes,et al.  Classifier Chains for Multi-label Classification , 2009, ECML/PKDD.

[44]  Sukanta Mondal,et al.  Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction. , 2014, Journal of theoretical biology.

[45]  D. Litchfield,et al.  Electrochemical Investigations of Tau Protein Phosphorylations and Interactions with Pin1 , 2012, Chemistry & biodiversity.

[46]  Zhenmin Tang,et al.  Enhancing Membrane Protein Subcellular Localization Prediction by Parallel Fusion of Multi-View Features , 2012, IEEE Transactions on NanoBioscience.

[47]  Jiun-Hung Chen,et al.  A multi-label classification based approach for sentiment classification , 2015, Expert Syst. Appl..

[48]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[49]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[50]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[51]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[52]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[53]  Min-Ling Zhang,et al.  Ml-rbf: RBF Neural Networks for Multi-Label Learning , 2009, Neural Processing Letters.

[54]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[55]  Mirella Di Lorenzo,et al.  Protein phosphorylation analysis based on proton release detection: potential tools for drug discovery. , 2014, Biosensors & bioelectronics.

[56]  Josef Kittler,et al.  Multilabel classification using heterogeneous ensemble of multi-label classifiers , 2012, Pattern Recognit. Lett..

[57]  Kuo-Chen Chou,et al.  Gpos-mPLoc: a top-down approach to improve the quality of predicting subcellular localization of Gram-positive bacterial proteins. , 2009, Protein and peptide letters.

[58]  Fabio Roli,et al.  Multi-label classification with a reject option , 2013, Pattern Recognit..

[59]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[60]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[61]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[62]  Xiaolong Wang,et al.  Protein Remote Homology Detection by Combining Chou’s Pseudo Amino Acid Composition and Profile‐Based Protein Representation , 2013, Molecular informatics.

[63]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[64]  James S. Duncan,et al.  Peptide biosensors for the electrochemical measurement of protein kinase activity. , 2008, Analytical chemistry.