WegoLoc: accurate prediction of protein subcellular localization using weighted Gene Ontology terms

SUMMARY We present an accurate and fast web server, WegoLoc for predicting subcellular localization of proteins based on sequence similarity and weighted Gene Ontology (GO) information. A term weighting method in the text categorization process is applied to GO terms for a support vector machine classifier. As a result, WegoLoc surpasses the state-of-the-art methods for previously used test datasets. WegoLoc supports three eukaryotic kingdoms (animals, fungi and plants) and provides human-specific analysis, and covers several sets of cellular locations. In addition, WegoLoc provides (i) multiple possible localizations of input protein(s) as well as their corresponding probability scores, (ii) weights of GO terms representing the contribution of each GO term in the prediction, and (iii) a BLAST E-value for the best hit with GO terms. If the similarity score does not meet a given threshold, an amino acid composition-based prediction is applied as a backup method. AVAILABILITY WegoLoc and User's guide are freely available at the website http://www.btool.org/WegoLoc CONTACT smchiks@ks.ac.kr; dougnam@unist.ac.kr SUPPLEMENTARY INFORMATION Supplementary data is available at http://www.btool.org/WegoLoc.

[1]  R. Murphy,et al.  Automated subcellular location determination and high-throughput microscopy. , 2007, Developmental cell.

[2]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[3]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[4]  Shiow-Fen Hwang,et al.  ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization , 2008, BMC Bioinformatics.

[5]  Sang-Mun Chi,et al.  Prediction of protein subcellular localization by weighted gene ontology terms. , 2010, Biochemical and biophysical research communications.

[6]  Juyong Park,et al.  Protein localization as a principal feature of the etiology and comorbidity of genetic diseases , 2011, Molecular systems biology.

[7]  Oliver Kohlbacher,et al.  MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition , 2006, Bioinform..

[8]  Burkhard Rost,et al.  Inferring sub-cellular localization through automated lexical analysis , 2002, ISMB.

[9]  Zhiyong Lu,et al.  GO Molecular Function Terms Are Predictive of Subcellular Localization , 2004, Pacific Symposium on Biocomputing.

[10]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[11]  Duane Szafron,et al.  Improving subcellular localization prediction using text classification and the gene ontology , 2008, Bioinform..

[12]  Piero Fariselli,et al.  BaCelLo: a balanced subcellular localization predictor , 2006, ISMB.

[13]  Shuigeng Zhou,et al.  Gene ontology based transfer learning for protein subcellular localization , 2011, BMC Bioinformatics.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Yang Dai,et al.  Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction , 2006, BMC Bioinformatics.

[16]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[17]  Hagit Shatkay,et al.  Pacific Symposium on Biocomputing 13:604-615(2008) EPILOC: A (WORKING) TEXT-BASED SYSTEM FOR PREDICTING PROTEIN SUBCELLULAR LOCATION , 2022 .

[18]  Oliver Kohlbacher,et al.  MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction , 2009, BMC Bioinformatics.

[19]  R. Casadio,et al.  BaCelLo: a Balanced subCellular Localization predictor. , 2007 .

[20]  Hagit Shatkay,et al.  SherLoc2: a high-accuracy hybrid method for predicting subcellular localization of proteins. , 2009, Journal of proteome research.

[21]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[22]  R. Casadio,et al.  The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. , 2008, Briefings in functional genomics & proteomics.