Recent progress in predicting protein sub-subcellular locations

In the last two decades, the number of the known protein sequences increased very rapidly. However, a knowledge of protein function only exists for a small portion of these sequences. Since the experimental approaches for determining protein functions are costly and time consuming, in silico methods have been introduced to bridge the gap between knowledge of protein sequences and their functions. Knowing the subcellular location of a protein is considered to be a critical step in understanding its biological functions. Many efforts have been undertaken to predict the protein subcellular locations in silico. With the accumulation of available data, the substructures of some subcellular organelles, such as the cell nucleus, mitochondria and chloroplasts, have been taken into consideration by several studies in recent years. These studies create a new research topic, namely ‘protein sub-subcellular location prediction’, which goes one level deeper than classic protein subcellular location prediction.

[1]  Wendy A Bickmore,et al.  Addressing protein localization within the nucleus , 2002, The EMBO journal.

[2]  Thierry Denoeux,et al.  An evidence-theoretic k-NN rule with parameter optimization , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[3]  P. Kucera,et al.  Novel method for evaluation of multi-class area under receiver operating characteristic , 2009, 2009 Fifth International Conference on Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control.

[4]  Yang Dai,et al.  Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction , 2006, BMC Bioinformatics.

[5]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[6]  Xiaoqi Zheng,et al.  Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: an approach from auto covariance transformation. , 2010, Protein and peptide letters.

[7]  Lin Zhi-Hua,et al.  Estimation of affinity of HLA-A*0201 restricted CTL epitope based on the SCORE function. , 2009, Protein and peptide letters.

[8]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[9]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[10]  Gil Alterovitz,et al.  Automation in Proteomics and Genomics: An Engineering Case-Based Approach , 2009 .

[11]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[12]  G von Heijne,et al.  Prediction of organellar targeting signals. , 2001, Biochimica et biophysica acta.

[13]  F.-M. Li,et al.  Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach , 2007, Amino Acids.

[14]  Zeng-Chang Qin,et al.  ROC analysis for predictions made by probabilistic classifiers , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[15]  Loris Nanni,et al.  Ensemblator: An ensemble of classifiers for reliable classification of biological data , 2007, Pattern Recognit. Lett..

[16]  Jean-Philippe Vert,et al.  A novel representation of protein sequences for prediction of subcellular location using support vector machines , 2005, Protein science : a publication of the Protein Society.

[17]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[18]  Chun Yan,et al.  Prediction of protein subcellular location using a combined feature of sequence , 2005, FEBS letters.

[19]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[20]  Yanda Li,et al.  Prediction of C-to-U RNA editing sites in plant mitochondria using both biochemical and evolutionary information. , 2008, Journal of theoretical biology.

[21]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[22]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[23]  Shiow-Fen Hwang,et al.  ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features , 2007, Biosyst..

[24]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.

[25]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[26]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[27]  Wen-Lian Hsu,et al.  Protein subcellular localization prediction of eukaryotes using a knowledge-based approach , 2009 .

[28]  Tianzi Jiang,et al.  Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms , 2004, BMC Bioinformatics.

[29]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[30]  Qiang Yang,et al.  Semi-supervised protein subcellular localization , 2009, BMC Bioinformatics.

[31]  Cajo J. F. ter Braak,et al.  Predicting sub-Golgi localization of type II membrane proteins , 2008, Bioinform..

[32]  Robert F. Murphy,et al.  Towards a Systematics for Protein Subcellular Location: Quantitative Description of Protein Localization Patterns and Automated Analysis of Fluorescence Microscope Images , 2000, ISMB.

[33]  Piero Fariselli,et al.  BaCelLo: a balanced subcellular localization predictor , 2006, ISMB.

[34]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[35]  S. Vilar,et al.  A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer. , 2009, Journal of theoretical biology.

[36]  Shinn-Ying Ho,et al.  Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[37]  Gertraud Burger,et al.  'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools , 2007, BMC Bioinformatics.

[38]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[39]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[40]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[41]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[42]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[43]  Kuo-Chen Chou,et al.  Methodology development for predicting subcellular localization and other attributes of proteins , 2007, Expert review of proteomics.

[44]  Wen-Chi Chou,et al.  Sequence analysis Advance Access publication August 23, 2010 , 2010 .

[45]  Loris Nanni,et al.  Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization , 2008, Amino Acids.

[46]  Oliver Kohlbacher,et al.  MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction , 2009, BMC Bioinformatics.

[47]  Fenglin Lv,et al.  Toward prediction of binding affinities between the MHC protein and its peptide ligands using quantitative structure-affinity relationship approach. , 2008, Protein and peptide letters.

[48]  Shiow-Fen Hwang,et al.  ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization , 2008, BMC Bioinformatics.

[49]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[50]  Yang Dai,et al.  An SVM-based system for predicting protein subnuclear localizations , 2005, BMC Bioinformatics.

[51]  Kuo-Chen Chou,et al.  Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. , 2005, Biochemical and biophysical research communications.

[52]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[53]  Kuo-Chen Chou,et al.  Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. , 2007, Protein engineering, design & selection : PEDS.

[54]  Graham Dellaire,et al.  The Nuclear Protein Database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome , 2003, Nucleic Acids Res..

[55]  K. Chou,et al.  Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. , 2007, Biopolymers.

[56]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[57]  K. Chou,et al.  Virus-mPLoc: A Fusion Classifier for Viral Protein Subcellular Location Prediction by Incorporating Multiple Sites , 2010, Journal of biomolecular structure & dynamics.

[58]  Bing Niu,et al.  Prediction of small molecules' metabolic pathways based on functional group composition. , 2009, Protein and peptide letters.

[59]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[60]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[61]  Michelle S. Scott,et al.  Predicting subcellular localization via protein motif co-occurrence. , 2004, Genome research.

[62]  Zheng Yuan Prediction of protein subcellular locations using Markov chain models , 1999, FEBS letters.

[63]  Xiaoyong Zou,et al.  Predicting protein structural class based on multi-features fusion. , 2008, Journal of theoretical biology.

[64]  Xinglai Ji,et al.  BSubLoc: database of protein subcellular localization , 2004, Nucleic Acids Res..

[65]  Kuo-Chen Chou,et al.  Predicting 22 protein localizations in budding yeast. , 2004, Biochemical and biophysical research communications.

[66]  Volker Brendel,et al.  PROSET-a fast procedure to create non-redundant sets of protein sequences , 1992 .

[67]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[68]  Loris Nanni,et al.  High performance set of PseAAC and sequence based descriptors for protein classification. , 2010, Journal of theoretical biology.

[69]  Yanda Li,et al.  SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. , 2009, Journal of theoretical biology.

[70]  Hongzhan Huang,et al.  Challenges and solutions in proteomics. , 2007, Current genomics.

[71]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[72]  Shinn-Ying Ho,et al.  Predicting protein subnuclear localization using GO-amino-acid composition features , 2009, Biosyst..

[73]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[74]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[75]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[76]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[77]  De-Shuang Huang,et al.  A protein interaction network analysis for yeast integral membrane protein. , 2008, Protein and peptide letters.

[78]  Burkhard Rost,et al.  Bioinformatics predictions of localization and targeting. , 2010, Methods in molecular biology.

[79]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[80]  Kuo-Chen Chou,et al.  Large‐scale plant protein subcellular location prediction , 2007, Journal of cellular biochemistry.

[81]  Oliver Kohlbacher,et al.  Going from where to why—interpretable prediction of protein subcellular localization , 2010, Bioinform..

[82]  R. Casadio,et al.  The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. , 2008, Briefings in functional genomics & proteomics.

[83]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[84]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[85]  B. Rost,et al.  Adaptation of protein surfaces to subcellular location. , 1998, Journal of molecular biology.

[86]  HuangYing,et al.  CD-HIT Suite , 2010 .

[87]  K. Chou,et al.  Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms , 2010 .

[88]  Rajani R Joshi,et al.  Characteristic peptides of protein secondary structural motifs. , 2010, Protein and peptide letters.

[89]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[90]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[91]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[92]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[93]  Pietro Liò,et al.  Wavelet change-point prediction of transmembrane proteins , 2000, Bioinform..

[94]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[95]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[96]  M. Bhasin,et al.  Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[97]  Kuo-Chen Chou,et al.  Predicting subcellular localization of proteins in a hybridization space , 2004, Bioinform..

[98]  Loris Nanni,et al.  A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease. , 2009, Protein and peptide letters.

[99]  Fengmin Li,et al.  Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach. , 2008, Protein and peptide letters.

[100]  Kuo-Chen Chou,et al.  Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo‐amino acid composition , 2004, Journal of cellular biochemistry.

[101]  Xiao-ming Hu,et al.  Geometry preserving projections algorithm for predicting membrane protein types. , 2010, Journal of theoretical biology.

[102]  Wang Fei,et al.  Amino acid classification based spectrum kernel fusion for protein subnuclear localization , 2010, BMC Bioinformatics.

[103]  Yu-Dong Cai,et al.  Predicting subcellular location of proteins using integrated-algorithm method , 2010, Molecular Diversity.

[104]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[105]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[106]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[107]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[108]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[109]  Yu-Dong Cai,et al.  Classification of transcription factors using protein primary structure. , 2010, Protein and peptide letters.

[110]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[111]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[112]  Yu Shyr,et al.  Improved prediction of lysine acetylation by support vector machines. , 2009, Protein and peptide letters.

[113]  Tongliang Zhang,et al.  Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location , 2008, Amino Acids.