iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites.

Although numerous efforts have been made for predicting the subcellular locations of proteins based on their sequence information, it still remains as a challenging problem, particularly when query proteins may have the multiplex character, i.e., they simultaneously exist, or move between, two or more different subcellular location sites. Most of the existing methods were established on the assumption: a protein has one, and only one, subcellular location. Actually, recent evidence has indicated an increasing number of human proteins having multiple subcellular locations. This kind of multiplex proteins should not be ignored because they may bear some special biological functions worthy of our attention. Based on the accumulation-label scale, a new predictor, called iLoc-Hum, was developed for identifying the subcellular localization of human proteins with both single and multiple location sites. As a demonstration, the jackknife cross-validation was performed with iLoc-Hum on a benchmark dataset of human proteins that covers the following 14 location sites: centrosome, cytoplasm, cytoskeleton, endoplasmic reticulum, endosome, extracellular, Golgi apparatus, lysosome, microsome, mitochondrion, nucleus, peroxisome, plasma membrane, and synapse, where some proteins belong to two, three or four locations but none has 25% or higher pairwise sequence identity to any other in the same subset. For such a complicated and stringent system, the overall success rate achieved by iLoc-Hum was 76%, which is remarkably higher than that by any of the existing predictors that also have the capacity to deal with this kind of system. Further comparisons were also made via two independent datasets; all indicated that the success rates by iLoc-Hum were even more significantly higher than its counterparts. As a user-friendly web-server, iLoc-Hum is freely accessible to the public at or . For the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results by choosing either a straightforward submission or a batch submission, without the need to follow the complicated mathematical equations involved.

[1]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[2]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[3]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[4]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[5]  Mark Gerstein,et al.  Sequences and topology. , 2001, Current opinion in structural biology.

[6]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[7]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[8]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[9]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[10]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[11]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[12]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[15]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[16]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[17]  Satoru Miyano,et al.  Extensive feature detection of N-terminal protein sorting signals , 2002, Bioinform..

[18]  Marc D. H. Hansen,et al.  Spatio-temporal regulation of Rac1 localization and lamellipodia dynamics during epithelial cell-cell adhesion. , 2002, Developmental cell.

[19]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[20]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[21]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[22]  Mark Gerstein,et al.  Editorial overviewSequences and topology , 2003 .

[23]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[24]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[25]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[26]  Suhai, S. (ed.): Theoretical and Computational Methods in Genome Research , 1997, Biologia Plantarum.

[27]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[28]  Martin Ester,et al.  Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[29]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[30]  Jean-Philippe Vert,et al.  A novel representation of protein sequences for prediction of subcellular location using support vector machines , 2005, Protein science : a publication of the Protein Society.

[31]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[32]  Xiangjun Liu,et al.  GNBSL: A new integrative system to predict the subcellular location for Gram‐negative bacteria proteins , 2006, Proteomics.

[33]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[34]  Oliver Kohlbacher,et al.  MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition , 2006, Bioinform..

[35]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[36]  Bhaskar D. Kulkarni,et al.  Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSM , 2007, Pattern Recognit. Lett..

[37]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[38]  R. Murphy,et al.  Automated subcellular location determination and high-throughput microscopy. , 2007, Developmental cell.

[39]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[40]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[41]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[42]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[43]  R. Nussinov,et al.  Protein-protein interaction networks: how can a hub protein bind so many different partners? , 2009, Trends in biochemical sciences.

[44]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[45]  A. Millar,et al.  Exploring the Function-Location Nexus: Using Multiple Lines of Evidence in Defining the Subcellular Location of Plant Proteins , 2009, The Plant Cell Online.

[46]  Kuo-Chen Chou,et al.  Analysis of Protein Pathway Networks Using Hybrid Properties , 2010, Molecules.

[47]  I. Vaisman,et al.  Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms. , 2010, Journal of theoretical biology.

[48]  Lin Lu,et al.  Predicting protein subcellular locations with feature selection and analysis. , 2010, Protein and peptide letters.

[49]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[50]  K. Chou,et al.  Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks , 2010, PloS one.

[51]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[52]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[53]  Hui Ding,et al.  Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. , 2011, Protein and peptide letters.