pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC.

One of the fundamental goals in cellular biochemistry is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. To realize this, it is indispensable to develop an automated method for fast and accurate identification of the subcellular locations of uncharacterized proteins. The current study is focused on plant protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most of the existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions. This kind of multiplex protein is particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called "pLoc-mPlant" by extracting the optimal GO (Gene Ontology) information into the Chou's general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validation on the same stringent benchmark dataset indicated that the proposed pLoc-mPlant predictor is remarkably superior to iLoc-Plant, the state-of-the-art method for predicting plant protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at , by which users can easily get their desired results without the need to go through the complicated mathematics involved.

[1]  K. Chou,et al.  iNitro-Tyr: Prediction of Nitrotyrosine Sites in Proteins with General Pseudo Amino Acid Composition , 2014, PloS one.

[2]  S. Khan,et al.  Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. , 2017, Journal of theoretical biology.

[3]  K. Chou,et al.  iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC , 2016, Oncotarget.

[4]  Kuo-Chen Chou,et al.  2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function , 2017, Molecular therapy. Nucleic acids.

[5]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[6]  R. Murphy,et al.  Automated subcellular location determination and high-throughput microscopy. , 2007, Developmental cell.

[7]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[8]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[9]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[10]  K. Chou,et al.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. , 2016, Analytical biochemistry.

[11]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[12]  K. Chou,et al.  iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC , 2017, Molecular therapy. Nucleic acids.

[13]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[14]  K. Chou,et al.  iPGK-PseAAC: Identify Lysine Phosphoglycerylation Sites in Proteins by Incorporating Four Different Tiers of Amino Acid Pairwise Coupling Information into the General PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[15]  S. Kung,et al.  GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition. , 2013, Journal of theoretical biology.

[16]  Kuo-Chen Chou,et al.  iPreny-PseAAC: Identify C-terminal Cysteine Prenylation Sites in Proteins by Incorporating Two Tiers of Sequence Couplings into PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[17]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[18]  Kuo-Chen Chou,et al.  iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals , 2017, Oncotarget.

[19]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[20]  K. Chou,et al.  iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. , 2011, Molecular bioSystems.

[21]  Sukanta Mondal,et al.  Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction. , 2014, Journal of theoretical biology.

[22]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[23]  Wei Chen,et al.  iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences , 2016, Oncotarget.

[24]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[25]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[26]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[27]  Dinesh Gupta,et al.  Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[29]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[30]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[31]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[32]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[33]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[34]  Ren Long,et al.  dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation , 2016, Scientific Reports.

[35]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[36]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[37]  M. Bakhtiarizadeh,et al.  OOgenesis_Pred: A sequence-based method for predicting oogenesis proteins by six different modes of Chou's pseudo amino acid composition. , 2017, Journal of theoretical biology.

[38]  Kuo-Chen Chou,et al.  Predicting membrane protein types by the LLDA algorithm. , 2008, Protein and peptide letters.

[39]  B. Liu,et al.  Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods , 2017, Oncotarget.

[40]  Kuo-Chen Chou,et al.  iPhos-PseEn: Identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier , 2016, Oncotarget.

[41]  Kuo-Chen Chou,et al.  pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. , 2016, Journal of theoretical biology.

[42]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[43]  Thanaruk Theeramunkong,et al.  Predict Subcellular Locations of Singleplex and Multiplex Proteins by Semi-Supervised Learning and Dimension-Reducing General Mode of Chou's PseAAC , 2013, IEEE Transactions on NanoBioscience.

[44]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[45]  Daoqiang Zhang,et al.  Identification of associations between genotypes and longitudinal phenotypes via temporally-constrained group sparse canonical correlation analysis , 2017, Bioinform..

[46]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[47]  Maqsood Hayat,et al.  iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC. , 2016, Molecular bioSystems.

[48]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[49]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[50]  Kuo-Chen Chou,et al.  A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. , 2003, Biochemical and biophysical research communications.

[51]  Saeed Ahmad,et al.  Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC , 2015, Comput. Methods Programs Biomed..

[52]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[53]  Kuo-Chen Chou,et al.  iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC , 2016, Oncotarget.

[54]  Prabina Kumar Meher,et al.  Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC , 2017, Scientific Reports.

[55]  Kuo-Chen Chou,et al.  iATC‐mISF: a multi‐label classifier for predicting the classes of anatomical therapeutic chemicals , 2016, Bioinform..

[56]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[57]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[58]  Kuo-Chen Chou,et al.  A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites , 2011, PloS one.

[59]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[60]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[61]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[62]  Maqsood Hayat,et al.  Author ' s Accepted Manuscript Classification of membrane protein types using Voting feature interval in combination with Chou ' s pseudo amino acid composition , 2015 .

[63]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[64]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.