SUBA3: a database for integrating experimentation and prediction to define the SUBcellular location of proteins in Arabidopsis

The subcellular location database for Arabidopsis proteins (SUBA3, http://suba.plantenergy.uwa.edu.au) combines manual literature curation of large-scale subcellular proteomics, fluorescent protein visualization and protein–protein interaction (PPI) datasets with subcellular targeting calls from 22 prediction programs. More than 14 500 new experimental locations have been added since its first release in 2007. Overall, nearly 650 000 new calls of subcellular location for 35 388 non-redundant Arabidopsis proteins are included (almost six times the information in the previous SUBA version). A re-designed interface makes the SUBA3 site more intuitive and easier to use than earlier versions and provides powerful options to search for PPIs within the context of cell compartmentation. SUBA3 also includes detailed localization information for reference organelle datasets and incorporates green fluorescent protein (GFP) images for many proteins. To determine as objectively as possible where a particular protein is located, we have developed SUBAcon, a Bayesian approach that incorporates experimental localization and targeting prediction data to best estimate a protein’s location in the cell. The probabilities of subcellular location for each protein are provided and displayed as a pictographic heat map of a plant cell in SUBA3.

[1]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[2]  L. Ungar,et al.  Chloroplast transit peptide prediction: a peek inside the black box. , 2001, Nucleic acids research.

[3]  John Hawkins,et al.  Predicting nuclear localization. , 2007, Journal of proteome research.

[4]  Jonathan D. G. Jones,et al.  Evidence for Network Evolution in an Arabidopsis Interactome Map , 2011, Science.

[5]  Piero Fariselli,et al.  BaCelLo: a balanced subcellular localization predictor , 2006, ISMB.

[6]  W. Frommer,et al.  ARAMEMNON, a Novel Database for Arabidopsis Integral Membrane Proteins1 , 2003, Plant Physiology.

[7]  Tatsuya Akutsu,et al.  Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition , 2007, BMC Bioinformatics.

[8]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[9]  J. Garin,et al.  AT_CHLORO, a Comprehensive Chloroplast Proteome Database with Subplastidial Localization and Curated Information on Envelope Proteins* , 2010, Molecular & Cellular Proteomics.

[10]  J. Selbig,et al.  SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data , 2011, Front. Plant Sci..

[11]  Oliver Kohlbacher,et al.  YLoc—an interpretable web server for predicting subcellular localization , 2010, Nucleic Acids Res..

[12]  Sebastian Maurer-Stroh,et al.  Prediction of peroxisomal targeting signal 1 containing proteins from amino acid sequence. , 2003, Journal of molecular biology.

[13]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[14]  W. Gruissem,et al.  MASCP Gator: An Aggregation Portal for the Visualization of Arabidopsis Proteomics Data1[C][OA] , 2010, Plant Physiology.

[15]  M. Schmid,et al.  Genome-Wide Insertional Mutagenesis of Arabidopsis thaliana , 2003, Science.

[16]  Ian Small,et al.  In silico methods for identifying organellar and suborganellar targeting peptides in Arabidopsis chloroplast proteins and for predicting the topology of membrane proteins. , 2011, Methods in molecular biology.

[17]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[18]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[19]  A. Millar,et al.  Exploring the Function-Location Nexus: Using Multiple Lines of Evidence in Defining the Subcellular Location of Plant Proteins , 2009, The Plant Cell Online.

[20]  Michel Schneider,et al.  The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program. , 2009, Journal of proteomics.

[21]  O. Kohlbacher,et al.  Prediction of dual protein targeting to plant organelles. , 2009, The New phytologist.

[22]  Satoru Miyano,et al.  Extensive feature detection of N-terminal protein sorting signals , 2002, Bioinform..

[23]  G. Heijne,et al.  ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sites , 1999, Protein science : a publication of the Protein Society.

[24]  Joseph R. Ecker,et al.  Moving forward in reverse: genetic technologies to enable genome-wide phenomic screens in Arabidopsis , 2006, Nature Reviews Genetics.

[25]  Julian Tonti-Filippini,et al.  Experimental Analysis of the Arabidopsis Mitochondrial Proteome Highlights Signaling and Regulatory Components, Provides Assessment of Targeting Prediction Programs, and Indicates Plant-Specific Mitochondrial Proteins Online version contains Web-only data. Article, publication date, and citation inf , 2004, The Plant Cell Online.

[26]  Martin Kuiper,et al.  Targeted interactomics reveals a complex core cell cycle machinery in Arabidopsis thaliana , 2010, Molecular systems biology.

[27]  Stavros J. Hamodrakas,et al.  PredSL: A Tool for the N-terminal Sequence-based Prediction of Protein Subcellular Localization , 2006, Genom. Proteom. Bioinform..

[28]  Eoin Fahy,et al.  MITOPRED: a web server for the prediction of mitochondrial proteins , 2004, Nucleic Acids Res..

[29]  Jean-Philippe Vert,et al.  A novel representation of protein sequences for prediction of subcellular location using support vector machines , 2005, Protein science : a publication of the Protein Society.

[30]  P Vincens,et al.  Computational method to predict mitochondrially imported proteins and their targeting sequences. , 1996, European journal of biochemistry.

[31]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[32]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[33]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[34]  Mark Stitt,et al.  A guide to using MapMan to visualize and compare Omics data in plants: a case study in the crop species, Maize. , 2009, Plant, cell & environment.

[35]  Joshua L. Heazlewood,et al.  SUBA: the Arabidopsis Subcellular Database , 2006, Nucleic Acids Res..

[36]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[37]  F. Legeai,et al.  Predotar: A tool for rapidly screening proteomes for N‐terminal targeting sequences , 2004, Proteomics.

[38]  E. Ruppin,et al.  Reconstruction of Arabidopsis metabolic network models accounting for subcellular compartmentalization and tissue-specificity , 2011, Proceedings of the National Academy of Sciences.

[39]  Guo-Zheng Li,et al.  Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins , 2008, Molecular Diversity.

[40]  Hagit Shatkay,et al.  Pacific Symposium on Biocomputing 13:604-615(2008) EPILOC: A (WORKING) TEXT-BASED SYSTEM FOR PREDICTING PROTEIN SUBCELLULAR LOCATION , 2022 .

[41]  Oliver Kohlbacher,et al.  MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction , 2009, BMC Bioinformatics.

[42]  Julian Tonti-Filippini,et al.  Combining Experimental and Predicted Datasets for Determination of the Subcellular Location of Proteins in Arabidopsis1[w] , 2005, Plant Physiology.

[43]  L. Quek,et al.  AraGEM, a Genome-Scale Reconstruction of the Primary Metabolic Network in Arabidopsis1[W] , 2009, Plant Physiology.

[44]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[45]  John Hawkins,et al.  Detecting and sorting targeting peptides with neural networks and support vector machines. , 2006, Journal of bioinformatics and computational biology.