Sequence‐structure mapping errors in the PDB: OB‐fold domains

The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error‐free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)‐fold, one of the highly populated folds, for the presence of sequence‐structure mapping errors. Using energy‐based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB‐structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence‐structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X‐ray data for one of the PDB entries containing a fairly inconspicuous sequence‐structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence‐structure assignment process or verifying the sequence mapping within poorly defined regions.

[1]  A. Goldman,et al.  Toward a quantum-mechanical description of metal-assisted phosphoryl transfer in pyrophosphatase , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[3]  C. W. Hilbers,et al.  Sequence-specific 1H-NMR assignment and secondary structure of the Tyr41----His mutant of the single-stranded DNA binding protein, gene V protein, encoded by the filamentous bacteriophage M13. , 1991, European journal of biochemistry.

[4]  A. Murzin OB(oligonucleotide/oligosaccharide binding)‐fold: common structural and functional solution for non‐homologous sequences. , 1993, The EMBO journal.

[5]  B. Luisi,et al.  A duplicated fold is the structural basis for polynucleotide phosphorylase catalytic activity, processivity, and regulation. , 2000, Structure.

[6]  A. McPherson,et al.  Refined structure of the gene 5 DNA binding protein from bacteriophage fd. , 1983, Journal of molecular biology.

[7]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[8]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[9]  J. Rullmann,et al.  Quality assessment of NMR structures: a statistical survey. , 1998, Journal of molecular biology.

[10]  E. G Arutiunian,et al.  X-Ray Diffraction Study of Inorganic Pyrophosphatase from Baker,S Yeast at the 3 Angstroms Resolution (Russian) , 1983 .

[11]  Ceslovas Venclovas,et al.  Comparative modeling in CASP5: Progress is evident, but alignment errors remain a significant hindrance , 2003, Proteins.

[12]  Cheng Yang,et al.  Crystal structure of human mitochondrial single-stranded DNA binding protein at 2.4 Å resolution , 1997, Nature Structural Biology.

[13]  T. Tsukihara,et al.  Roles of functional loops and the C-terminal segment of a single-stranded DNA binding protein elucidated by X-Ray structure analysis. , 2000, Journal of biochemistry.

[14]  C. Sander,et al.  Errors in protein structures , 1996, Nature.

[15]  T C Terwilliger,et al.  Structure of the gene V protein of bacteriophage f1 determined by multiwavelength x-ray diffraction on the selenomethionyl protein. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.

[17]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[18]  G. Waksman,et al.  Crystal structure of the homo-tetrameric DNA binding domain of Escherichia coli single-stranded DNA-binding protein determined by multiwavelength x-ray diffraction on the selenomethionyl protein at 2.9-A resolution. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[19]  C Venclovas,et al.  Comparative modeling of CASP4 target proteins: Combining results of sequence search with three‐dimensional structure assessment , 2001, Proteins.

[20]  M. Vijayan,et al.  Structure of Mycobacterium tuberculosis single-stranded DNA-binding protein. Variability in quaternary structure and its implications. , 2003, Journal of molecular biology.

[21]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[22]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[23]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[24]  R. Varadarajan,et al.  Discrepancies between the NMR and X-ray structures of uncomplexed barstar: analysis suggests that packing densities of protein structures determined by NMR are unreliable. , 1998, Biochemistry.

[25]  Eugene V. Koonin,et al.  SEALS: A System for Easy Analysis of Lots of Sequences , 1997, ISMB.

[26]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[27]  Axel T. Brunger,et al.  X-PLOR Version 3.1: A System for X-ray Crystallography and NMR , 1992 .

[28]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[29]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[30]  Leszek Rychlewski,et al.  Fold-recognition detects an error in the Protein Data Bank , 2002, Bioinform..

[31]  M. Totrov,et al.  Contact area difference (CAD): a robust measure to evaluate accuracy of protein models. , 1997, Journal of molecular biology.

[32]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[33]  G. Waksman,et al.  Structure of the DNA binding domain of E. coli SSB bound to ssDNA , 2000, Nature Structural Biology.

[34]  J. Thornton,et al.  Stereochemical quality of protein structure coordinates , 1992, Proteins.

[35]  D. Moras,et al.  Crystal structure of aspartyl‐tRNA synthetase from Pyrococcus kodakaraensis KOD: archaeon specificity and catalytic mechanism of adenylate formation , 1998, The EMBO journal.