An Ensemble Classifier for Eukaryotic Protein Subcellular Location Prediction Using Gene Ontology Categories and Amino Acid Hydrophobicity

With the rapid increase of protein sequences in the post-genomic age, it is challenging to develop accurate and automated methods for reliably and quickly predicting their subcellular localizations. Till now, many efforts have been tried, but most of which used only a single algorithm. In this paper, we proposed an ensemble classifier of KNN (k-nearest neighbor) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic proteins based on a voting system. The overall prediction accuracies by the one-versus-one strategy are 78.17%, 89.94% and 75.55% for three benchmark datasets of eukaryotic proteins. The improved prediction accuracies reveal that GO annotations and hydrophobicity of amino acids help to predict subcellular locations of eukaryotic proteins.

[1]  Matthew R. Laird,et al.  PSORTdb—an expanded, auto-updated, user-friendly protein subcellular localization database for Bacteria and Archaea , 2010, Nucleic Acids Res..

[2]  G. Orphanides,et al.  Mapping molecular responses to xenoestrogens through Gene Ontology and pathway analysis of toxicogenomic data. , 2005, Reproductive toxicology.

[3]  Roland Eils,et al.  Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains , 2006, BMC Bioinformatics.

[4]  H.-B. Shen,et al.  Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction , 2007, Amino Acids.

[5]  Yang Dai,et al.  Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction , 2006, BMC Bioinformatics.

[6]  Tae-Sun Choi,et al.  Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers , 2009, Amino Acids.

[7]  Jie Yang,et al.  Predicting subcellular localization of gram-negative bacterial proteins by linear dimensionality reduction method. , 2010, Protein and peptide letters.

[8]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[9]  Temple F. Smith Occam's razor , 1980, Nature.

[10]  Mamoon Rashid,et al.  Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs , 2007, BMC Bioinformatics.

[11]  Tao Huang,et al.  Analysis and Prediction of Translation Rate Based on Sequence and Functional Features of the mRNA , 2011, PloS one.

[12]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[13]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[14]  Xiaoqi Zheng,et al.  Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation , 2012, Amino Acids.

[15]  Jian-Ding Qiu,et al.  Predicting subcellular location of apoptosis proteins based on wavelet transform and support vector machine , 2010, Amino Acids.

[16]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[17]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[18]  David W Mount,et al.  Using hidden Markov models to align multiple sequences. , 2009, Cold Spring Harbor protocols.

[19]  Bo Liao,et al.  Predicting apoptosis protein subcellular location with PseAAC by incorporating tripeptide composition. , 2011, Protein and peptide letters.

[20]  K. Chou,et al.  Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins. , 2000, Molecular cell biology research communications : MCBRC.

[21]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[22]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[23]  Qiang Yang,et al.  Multitask Learning for Protein Subcellular Location Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[25]  Li Zhang,et al.  Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction. , 2011, Biochimica et biophysica acta.

[26]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[27]  Bharat Panwar,et al.  Predicting sub-cellular localization of tRNA synthetases from their primary structures , 2011, Amino Acids.

[28]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[29]  K. Chou,et al.  Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods , 2011, PloS one.

[30]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[31]  Kuo-Chen Chou,et al.  Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property , 2011, PloS one.

[32]  JIAN GUO,et al.  Protein Subcellular Localization Based on Psi-blast and Machine Learning , 2006, J. Bioinform. Comput. Biol..

[33]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[34]  Y. Zhang,et al.  Prediction of eukaryotic protein subcellular multi- localisation with a combined KNN-SVM ensemble classifier , 2011 .

[35]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[36]  Gajendra P. S. Raghava,et al.  Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine , 2006, Pattern Recognit. Lett..

[37]  Zheng Yuan Prediction of protein subcellular locations using Markov chain models , 1999, FEBS letters.

[38]  Yue Zhou,et al.  Effect of a Novel Recombinant Protein of FibronectinIII7-10/Cadherin 11 EC1-2 on Osteoblastic Adhesion and Differentiation , 2009, Bioscience, biotechnology, and biochemistry.

[39]  Shinn-Ying Ho,et al.  Predicting protein subnuclear localization using GO-amino-acid composition features , 2009, Biosyst..

[40]  Jin Wang,et al.  An FPT Approach for Predicting Protein Localization from Yeast Genomic Data , 2011, PloS one.

[41]  S. Namkoong,et al.  New approaches to pathogenic gene function discovery with human squamous cell cervical carcinoma by gene ontology. , 2005, Gynecologic oncology.

[42]  Ziv Bar-Joseph,et al.  Ieee/acm Transactions on Computational Biology and Bioinformatics Discriminative Motif Finding for Predicting Protein Subcellular Localization , 2022 .

[43]  Shuigeng Zhou,et al.  Gene ontology based transfer learning for protein subcellular localization , 2011, BMC Bioinformatics.

[44]  Yong-Sheng Ding,et al.  Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection , 2010, Amino Acids.

[45]  Peilin Jia,et al.  Prediction of subcellular protein localization based on functional domain composition. , 2007, Biochemical and biophysical research communications.

[46]  Gajendra P. S. Raghava,et al.  Identification of NAD interacting residues in proteins , 2010, BMC Bioinformatics.

[47]  Cunshuan Xu,et al.  Prediction of rat protein subcellular localization with pseudo amino acid composition based on multiple sequential features. , 2011, Protein and peptide letters.

[48]  X.-B. Zhou,et al.  Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine , 2008, Amino Acids.

[49]  Xiaoqi Zheng,et al.  Predicting protein subcellular localization by pseudo amino acid composition with a segment-weighted and features-combined approach. , 2011, Protein and peptide letters.

[50]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[51]  Pier Luigi Martelli,et al.  MemLoci: predicting subcellular localization of membrane proteins in eukaryotes , 2011, Bioinform..

[52]  Daniel E. Weeks,et al.  The Complexity of Linkage Analysis with Neural Networks , 2001, Human Heredity.

[53]  Asifullah Khan,et al.  CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition , 2011, Comput. Biol. Chem..

[54]  Gertraud Burger,et al.  TESTLoc: protein subcellular localization prediction from EST data , 2010, BMC Bioinformatics.

[55]  Kuo-Chen Chou,et al.  Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition. , 2005, Journal of theoretical biology.

[56]  Zhengzhi Wang,et al.  Prediction of subcellular localization of eukaryotic proteins using position-specific profiles and neural network with weighted inputs. , 2007, Journal of genetics and genomics = Yi chuan xue bao.

[57]  Yu-Dong Cai,et al.  A novel computational method to predict transcription factor DNA binding preference. , 2006, Biochemical and biophysical research communications.

[58]  M. Vihinen,et al.  PROlocalizer: integrated web service for protein subcellular localization prediction , 2010, Amino Acids.

[59]  Yuan Zhang,et al.  Fabrication and characterization of a recombinant fibronectin/cadherin bio-inspired ceramic surface and its influence on adhesion and ossification in vitro. , 2010, Acta biomaterialia.

[60]  Xiaoqi Zheng,et al.  A complexity-based method for predicting protein subcellular location , 2009, Amino Acids.

[61]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[62]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.