Databases and ontologies Mapping PDB chains to UniProtKB entries

MOTIVATION UniProtKB/SwissProt is the main resource for detailed annotations of protein sequences. This database provides a jumping-off point to many other resources through the links it provides. Among others, these include other primary databases, secondary databases, the Gene Ontology and OMIM. While a large number of links are provided to Protein Data Bank (PDB) files, obtaining a regularly updated mapping between UniProtKB entries and PDB entries at the chain or residue level is not straightforward. In particular, there is no regularly updated resource which allows a UniProtKB/SwissProt entry to be identified for a given residue of a PDB file. RESULTS We have created a completely automatically maintained database which maps PDB residues to residues in UniProtKB/SwissProt and UniProtKB/trEMBL entries. The protocol uses links from PDB to UniProtKB, from UniProtKB to PDB and a brute-force sequence scan to resolve PDB chains for which no annotated link is available. Finally the sequences from PDB and UniProtKB are aligned to obtain a residue-level mapping. AVAILABILITY The resource may be queried interactively or downloaded from http://www.bioinf.org.uk/pdbsws/.

[1]  Zhilei Chen,et al.  A highly sensitive selection method for directed evolution of homing endonucleases , 2005, Nucleic acids research.

[2]  Philip E. Bourne,et al.  The Protein Data Bank (PDB) | NIST , 2002 .

[3]  Mark G. Hinds,et al.  Solution Structure of Leukemia Inhibitory Factor* , 1998, The Journal of Biological Chemistry.

[4]  Andrew C. R. Martin,et al.  Mapping SNPs to protein sequence and structure data , 2005, Bioinform..

[5]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  C. Orengo,et al.  Protein folds and functions. , 1998, Structure.

[7]  K. Nagai,et al.  A human recombinant haemoglobin designed for use as a blood substitute , 1992, Nature.

[8]  Andrew C. R. Martin PDBSprotEC: a Web-accessible database linking PDB chains to EC numbers via SwissProt , 2004, Bioinform..

[9]  Ronen Marmorstein,et al.  Structure of Tetrahymena GCN5 bound to coenzyme A and a histone H3 peptide , 1999, Nature.

[10]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[11]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[12]  L Brennan,et al.  Solution structure of the mEGF/TGFalpha44-50 chimeric growth factor. , 2001, European journal of biochemistry.

[13]  Lorraine Brennan,et al.  Solution structure of the mEGF/TGFα44−50 chimeric growth factor , 2001 .