2011 Ieee International Workshop on Machine Learning for Signal Processing Protein Subcellular Localization Prediction Based on Profile Alignment and Gene Ontology

The functions of proteins are closely related to their subcellular locations. Computational methods are required to replace the laborious and time-consuming experimental processes for proteomics research. This paper proposes combining homology-based profile alignment methods and functional-domain based Gene Ontology (GO) methods to predict the subcellular locations of proteins. The feature vectors constructed by these two methods are recognized by support vector machine (SVM) classifiers, and their scores are fused to enhance classification performance. The paper also investigates different approaches to constructing the GO vectors based on the GO terms returned from InterProScan. The results demonstrate that the GO methods are comparable to profile-alignment methods and overshadow those based on amino-acid compositions. Also, the fusion of these two methods can outperform the individual methods.

[1]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[2]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[3]  Sun-Yuan Kung,et al.  Speeding up subcellular localization by extracting informative regions of protein sequences for profile alignment , 2010, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[4]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[5]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[6]  K. Nakai,et al.  PROTEIN SUBCELLULAR LOCALIZATION PREDICTION , 2008 .

[7]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[8]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[9]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[10]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[11]  Sun-Yuan Kung,et al.  PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[13]  Shiow-Fen Hwang,et al.  ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization , 2008, BMC Bioinformatics.

[14]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[15]  Burkhard Rost,et al.  Sequence conserved for subcellular localization , 2002, Protein science : a publication of the Protein Society.

[16]  Zhiyong Lu,et al.  Predicting subcellular localization of proteins using machine-learned classifiers , 2004, Bioinform..

[17]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[18]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[19]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..