E-MSD: improving data deposition and structure quality

The Macromolecular Structure Database (MSD) () [H. Boutselakis, D. Dimitropoulos, J. Fillon, A. Golovin, K. Henrick, A. Hussain, J. Ionides, M. John, P. A. Keller, E. Krissinel et al. (2003) E-MSD: the European Bioinformatics Institute Macromolecular Structure Database. Nucleic Acids Res., 31, 458–462.] group is one of the three partners in the worldwide Protein DataBank (wwPDB), the consortium entrusted with the collation, maintenance and distribution of the global repository of macromolecular structure data [H. Berman, K. Henrick and H. Nakamura (2003) Announcing the worldwide Protein Data Bank. Nature Struct. Biol., 10, 980.]. Since its inception, the MSD group has worked with partners around the world to improve the quality of PDB data, through a clean up programme that addresses inconsistencies and inaccuracies in the legacy archive. The improvements in data quality in the legacy archive have been achieved largely through the creation of a unified data archive, in the form of a relational database that stores all of the data in the wwPDB. The three partners are working towards improving the tools and methods for the deposition of new data by the community at large. The implementation of the MSD database, together with the parallel development of improved tools and methodologies for data harvesting, validation and archival, has lead to significant improvements in the quality of data that enters the archive. Through this and related projects in the NMR and EM realms the MSD continues to improve the quality of publicly available structural data.

[1]  Collaborative Computational,et al.  The CCP4 suite: programs for protein crystallography. , 1994, Acta crystallographica. Section D, Biological crystallography.

[2]  T. N. Bhat,et al.  The CCPN project: an interim report on a data model for the NMR community , 2002, Nature Structural Biology.

[3]  Li Xueli,et al.  Design of a data model for developing laboratory information management and analysis systems for protein production , 2004, Proteins.

[4]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[5]  Wayne Boucher,et al.  The CCPN data model for NMR spectroscopy: Development of a software pipeline , 2005, Proteins.

[6]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[7]  T. A. Jones,et al.  The Uppsala Electron-Density Server. , 2004, Acta crystallographica. Section D, Biological crystallography.

[8]  T. N. Bhat,et al.  A framework for scientific data modeling and automated software development , 2005, Bioinform..

[9]  Chris Morris,et al.  MOLE: A data management application based on a protein production data model , 2005, Proteins.

[10]  Stephen D Fuller Depositing electron microscopy maps. , 2003, Structure.

[11]  Miron Livny,et al.  RECOORD: A recalculated coordinate database of 500+ proteins from the PDB using restraints from the BioMagResBank , 2005, Proteins.

[12]  K Henrick,et al.  EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information. , 2003, Journal of structural biology.

[13]  S J Wodak,et al.  SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. , 1999, Acta crystallographica. Section D, Biological crystallography.

[14]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[15]  Alexandre M J J Bonvin,et al.  BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures , 2005, Journal of biomolecular NMR.

[16]  Sameer Velankar,et al.  E-MSD: an integrated data resource for bioinformatics , 2004, Nucleic Acids Res..

[17]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[18]  Jaime Prilusky,et al.  Electronic Reprint Biological Crystallography Halx: an Open-source Lims (laboratory Information Management System) for Small-to Large-scale Laboratories Biological Crystallography Halx: an Open-source Lims (laboratory Information Management System) for Small-to Large-scale Laboratories , 2022 .

[19]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[20]  J. Rullmann,et al.  Quality assessment of NMR structures: a statistical survey. , 1998, Journal of molecular biology.

[21]  K. Henrick,et al.  New electron microscopy database and deposition system. , 2002, Trends in biochemical sciences.