The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003

The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.

[1]  Nicolle H. Packer,et al.  GlycoSuiteDB: a new curated relational database of glycoprotein glycan structures and their biological sources , 2001, Nucleic Acids Res..

[2]  Rodrigo Lopez,et al.  The EMBL Nucleotide Sequence Database , 1999, Nucleic Acids Res..

[3]  A Bairoch,et al.  The human proteomics initiative (HPI). , 2001, Trends in biotechnology.

[4]  Jaime Prilusky,et al.  GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support , 1998, Bioinform..

[5]  Jérôme Gouzy,et al.  ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..

[6]  Rolf Apweiler,et al.  A novel method for automatic functional annotation of proteins , 1999, Bioinform..

[7]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[8]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: 2002 update , 2002, Nucleic Acids Res..

[9]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[10]  Sue Povey,et al.  Genew: the Human Gene Nomenclature Database , 2002, Nucleic Acids Res..

[11]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[12]  Rolf Apweiler,et al.  A novel method for automatic and reliable functional annotation of proteins , 1998, German Conference on Bioinformatics.

[13]  Fan Yang,et al.  TIGRFAMs: a protein family resource for the functional identification of proteins , 2001, Nucleic Acids Res..

[14]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[15]  Terri K. Attwood,et al.  PRINTS and PRINTS-S shed light on protein ancestry , 2002, Nucleic Acids Res..

[16]  Christopher J. Lee,et al.  A genomic view of alternative splicing , 2002, Nature Genetics.

[17]  P. Bork,et al.  Alternative splicing and genome complexity , 2002, Nature Genetics.

[18]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..

[19]  M. Hattori,et al.  The DNA sequence of human chromosome 21 , 2000, Nature.

[20]  A Bairoch,et al.  SWISS-PROT: connecting biomolecular knowledge via a protein database. , 2001, Current issues in molecular biology.

[21]  Rolf Apweiler,et al.  Functional Information in SWISS-PROT: the Basis for Large-scale Characterisation of Protein Sequences , 2001, Briefings Bioinform..

[22]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[23]  W. Wasserman,et al.  GeneLynx: a gene-centric portal to the human genome. , 2001, Genome research.

[24]  The Wellcome Trust Sanger Institute The DNA sequence and comparative analysis of human chromosome 20 , 2001 .

[25]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[28]  Rolf Apweiler,et al.  VARSPLIC: alternatively-spliced protein sequences derived from SWISS-PROT and TrEMBL , 2000, Bioinform..

[29]  Thure Etzold,et al.  SRS - an indexing and retrieval tool for flat file data libraries , 1993, Comput. Appl. Biosci..

[30]  Maria Jesus Martin,et al.  High-quality Protein Knowledge Resource: SWISS-PROT and TrEMBL , 2002, Briefings Bioinform..

[31]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[32]  Maria Jesus Martin,et al.  Removing Redundancy in SWISS-PROT and TrEMBL , 1999, German Conference on Bioinformatics.