SUBAcon: a consensus algorithm for unifying the subcellular localization data of the Arabidopsis proteome

MOTIVATION Knowing the subcellular location of proteins is critical for understanding their function and developing accurate networks representing eukaryotic biological processes. Many computational tools have been developed to predict proteome-wide subcellular location, and abundant experimental data from green fluorescent protein (GFP) tagging or mass spectrometry (MS) are available in the model plant, Arabidopsis. None of these approaches is error-free, and thus, results are often contradictory. RESULTS To help unify these multiple data sources, we have developed the SUBcellular Arabidopsis consensus (SUBAcon) algorithm, a naive Bayes classifier that integrates 22 computational prediction algorithms, experimental GFP and MS localizations, protein-protein interaction and co-expression data to derive a consensus call and probability. SUBAcon classifies protein location in Arabidopsis more accurately than single predictors. AVAILABILITY SUBAcon is a useful tool for recovering proteome-wide subcellular locations of Arabidopsis proteins and is displayed in the SUBA3 database (http://suba.plantenergy.uwa.edu.au). The source code and input data is available through the SUBA3 server (http://suba.plantenergy.uwa.edu.au//SUBAcon.html) and the Arabidopsis SUbproteome REference (ASURE) training set can be accessed using the ASURE web portal (http://suba.plantenergy.uwa.edu.au/ASURE).

[1]  S. Munro,et al.  Putative Glycosyltransferases and Other Plant Golgi Apparatus Proteins Are Revealed by LOPIT Proteomics1[W] , 2012, Plant Physiology.

[2]  S. Takagi,et al.  LITTLE NUCLEI 1 and 4 regulate nuclear morphology in Arabidopsis thaliana. , 2013, Plant & cell physiology.

[3]  Piero Fariselli,et al.  BaCelLo: a balanced subcellular localization predictor , 2006, ISMB.

[4]  Myriam Ferro,et al.  Identification of New Intrinsic Proteins in Arabidopsis Plasma Membrane Proteome*S , 2004, Molecular & Cellular Proteomics.

[5]  A. Harvey Millar,et al.  A Predicted Interactome for Arabidopsis1[C][W][OA] , 2007, Plant Physiology.

[6]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[7]  Ludmila I. Kuncheva,et al.  On the optimality of Naïve Bayes with dependent binary features , 2006, Pattern Recognition Letters.

[8]  Aalt D J van Dijk,et al.  Genome-Wide Computational Function Prediction of Arabidopsis Proteins by Integration of Multiple Data Sources1[C][W][OA] , 2010, Plant Physiology.

[9]  Robert Fredriksson,et al.  Mapping the human membrane proteome : a majority of the human membrane proteins can be classified according to function and evolutionary origin , 2015 .

[10]  Hagit Shatkay,et al.  Pacific Symposium on Biocomputing 13:604-615(2008) EPILOC: A (WORKING) TEXT-BASED SYSTEM FOR PREDICTING PROTEIN SUBCELLULAR LOCATION , 2022 .

[11]  Oliver Kohlbacher,et al.  MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction , 2009, BMC Bioinformatics.

[12]  John A. W. McCall,et al.  Machine learning for improved pathological staging of prostate cancer: A performance comparison on a range of classifiers , 2012, Artif. Intell. Medicine.

[13]  Michel Schneider,et al.  The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program. , 2009, Journal of proteomics.

[14]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[15]  Ronald J. Moore,et al.  Integrative Analysis of the Mitochondrial Proteome in Yeast , 2004, PLoS biology.

[16]  Sabine Cornelsen,et al.  Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  A. Millar,et al.  Analysis of the Arabidopsis cytosolic proteome highlights subcellular partitioning of central plant metabolism. , 2011, Journal of proteome research.

[18]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[19]  I. Small,et al.  A reevaluation of dual-targeting of proteins to mitochondria and chloroplasts. , 2013, Biochimica et biophysica acta.

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  陈奕欣 Ongoing and future developments at the Universal Protein Resource , 2011 .

[22]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[23]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[24]  Changqing Li,et al.  An Ensemble Classifier for Eukaryotic Protein Subcellular Location Prediction Using Gene Ontology Categories and Amino Acid Hydrophobicity , 2012, PloS one.

[25]  F. Legeai,et al.  Predotar: A tool for rapidly screening proteomes for N‐terminal targeting sequences , 2004, Proteomics.

[26]  K. Sjölander,et al.  The Arabidopsis thaliana Chloroplast Proteome Reveals Pathway Abundance and Novel Protein Functions , 2004, Current Biology.

[27]  M. Vihinen,et al.  PROlocalizer: integrated web service for protein subcellular localization prediction , 2010, Amino Acids.

[28]  Harry Zhang,et al.  Exploring Conditions For The Optimality Of Naïve Bayes , 2005, Int. J. Pattern Recognit. Artif. Intell..

[29]  Kengo Kinoshita,et al.  ATTED-II provides coexpressed gene networks for Arabidopsis , 2008, Nucleic Acids Res..

[30]  S. Komatsu Plasma membrane proteome in Arabidopsis and rice , 2008, Proteomics.

[31]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[32]  R. Casadio,et al.  BaCelLo: a Balanced subCellular Localization predictor. , 2007 .

[33]  Mark A. Ragan,et al.  BMC Systems Biology BioMed Central Research article Protein-protein interaction as a predictor of subcellular location , 2008 .

[34]  Ian R. Castleden,et al.  SUBA3: a database for integrating experimentation and prediction to define the SUBcellular location of proteins in Arabidopsis , 2012, Nucleic Acids Res..

[35]  A. Millar,et al.  Exploring the Function-Location Nexus: Using Multiple Lines of Evidence in Defining the Subcellular Location of Plant Proteins , 2009, The Plant Cell Online.

[36]  K. Nakai,et al.  Prediction of subcellular locations of proteins: Where to proceed? , 2010, Proteomics.

[37]  Jun Liu,et al.  Quantitative Proteomics Reveals Dynamic Changes in the Plasma Membrane During Arabidopsis Immune Signaling* , 2012, Molecular & Cellular Proteomics.

[38]  Qi Sun,et al.  PPDB, the Plant Proteomics Database at Cornell , 2008, Nucleic Acids Res..

[39]  Jonathan Qiang Jiang,et al.  Predicting multiplex subcellular localization of proteins using protein-protein interaction network: a comparative study , 2012, BMC Bioinformatics.

[40]  Oliver Kohlbacher,et al.  YLoc—an interpretable web server for predicting subcellular localization , 2010, Nucleic Acids Res..

[41]  Joshua L. Heazlewood,et al.  SUBA: the Arabidopsis Subcellular Database , 2006, Nucleic Acids Res..

[42]  Rod B. Watson,et al.  Mapping the Arabidopsis organelle proteome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Ian Small,et al.  In silico methods for identifying organellar and suborganellar targeting peptides in Arabidopsis chloroplast proteins and for predicting the topology of membrane proteins. , 2011, Methods in molecular biology.

[44]  I. Hwang,et al.  Both the Hydrophobicity and a Positively Charged Region Flanking the C-Terminal Region of the Transmembrane Domain of Signal-Anchored Proteins Play Critical Roles in Determining Their Targeting Specificity to the Endoplasmic Reticulum or Endosymbiotic Organelles in Arabidopsis Cells[W] , 2011, Plant Cell.

[45]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[46]  W. Gruissem,et al.  MASCP Gator: An Aggregation Portal for the Visualization of Arabidopsis Proteomics Data1[C][OA] , 2010, Plant Physiology.

[47]  Bernhard Knierim,et al.  Isolation and Proteomic Characterization of the Arabidopsis Golgi Defines Functional and Novel Components Involved in Plant Cell Wall Biosynthesis1[W][OA] , 2012, Plant Physiology.

[48]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[49]  H.-B. Shen,et al.  Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction , 2007, Amino Acids.

[50]  D. Inzé,et al.  Systematic Localization of the Arabidopsis Core Cell Cycle Proteins Reveals Novel Cell Division Complexes1[W][OA] , 2009, Plant Physiology.

[51]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[52]  Stavros J. Hamodrakas,et al.  PredSL: A Tool for the N-terminal Sequence-based Prediction of Protein Subcellular Localization , 2006, Genom. Proteom. Bioinform..

[53]  O. Emanuelsson,et al.  Sorting Signals, N-Terminal Modifications and Abundance of the Chloroplast Proteome , 2008, PloS one.

[54]  Brian R. King,et al.  ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes , 2007, Genome biology.

[55]  W. Majeran,et al.  Construction of plastid reference proteomes for maize and Arabidopsis and evaluation of their orthologous relationships; the concept of orthoproteomics. , 2013, Journal of proteome research.

[56]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[57]  P Vincens,et al.  Computational method to predict mitochondrially imported proteins and their targeting sequences. , 1996, European journal of biochemistry.

[58]  Ming Chen,et al.  PSI: A Comprehensive and Integrative Approach for Accurate Plant Subcellular Localization Prediction , 2013, PloS one.

[59]  Ziv Bar-Joseph,et al.  Ieee/acm Transactions on Computational Biology and Bioinformatics Discriminative Motif Finding for Predicting Protein Subcellular Localization , 2022 .

[60]  K. Vandepoele,et al.  Systematic Identification of Functional Plant Modules through the Integration of Complementary Data Sources1[W][OA] , 2012, Plant Physiology.

[61]  A. Millar,et al.  Recent surprises in protein targeting to mitochondria and plastids. , 2006, Current opinion in plant biology.