LOC3D: annotate sub-cellular localization for protein structures

LOC3D (http://cubic.bioc.columbia.edu/db/LOC3d/) is both a weekly-updated database and a web server for predictions of sub-cellular localization for eukaryotic proteins of known three-dimensional (3D) structure. Localization is predicted using four different methods: (i) PredictNLS, prediction of nuclear proteins through nuclear localization signals; (ii) LOChom, inferring localization through sequence homology; (iii) LOCkey, inferring localization through automatic text analysis of SWISS-PROT keywords; and (iv) LOC3Dini, ab initio prediction through a system of neural networks and vector support machines. The final prediction is based on the method that predicts localization with the highest confidence. The LOC3D database currently contains predictions for >8700 eukaryotic protein chains taken from the Protein Data Bank (PDB). The web server can be used to predict sub-cellular localization for proteins for which only a predicted structure is available from threading servers. This makes the resource of particular interest to structural genomics initiatives.

[1]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[2]  E V Koonin,et al.  Bridging the gap between sequence and function. , 2000, Trends in genetics : TIG.

[3]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[4]  M. Ashburner,et al.  Annotating eukaryote genomes. , 2000, Current opinion in structural biology.

[5]  B Rost,et al.  Bridging the protein sequence-structure gap by structure predictions. , 1996, Annual review of biophysics and biomolecular structure.

[6]  Burkhard Rost,et al.  NLSdb: database of nuclear localization signals , 2003, Nucleic Acids Res..

[7]  Burkhard Rost,et al.  Inferring sub-cellular localization through automated lexical analysis , 2002, ISMB.

[8]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[9]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[10]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[11]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[12]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[13]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[14]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[15]  Peer Bork,et al.  SMART: identification and annotation of domains from signalling and extracellular protein sequences , 1999, Nucleic Acids Res..

[16]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[17]  Larry Wall,et al.  Programming Perl , 1991 .

[18]  Burkhard Rost,et al.  Sequence conserved for subcellular localization , 2002, Protein science : a publication of the Protein Society.

[19]  L D Stein Using Perl to facilitate biological analysis. , 2001, Methods of biochemical analysis.

[20]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[21]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[22]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[23]  M. Hall,et al.  The T-DNA-linked VirD2 protein contains two distinct functional nuclear localization signals. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[24]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[25]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[26]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[27]  P Bork,et al.  Wanted: subcellular localization of proteins based on sequence. , 1998, Trends in cell biology.

[28]  Burkhard Rost,et al.  PEP: Predictions for Entire Proteomes , 2003, Nucleic Acids Res..

[29]  B. Rost,et al.  Finding nuclear localization signals , 2000, EMBO reports.

[30]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[31]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .