Publication of nuclear magnetic resonance experimental data with semantic web technology and the application thereof to biomedical research of proteins

BackgroundThe nuclear magnetic resonance (NMR) spectroscopic data for biological macromolecules archived at the BioMagResBank (BMRB) provide a rich resource of biophysical information at atomic resolution. The NMR data archived in NMR-STAR ASCII format have been implemented in a relational database. However, it is still fairly difficult for users to retrieve data from the NMR-STAR files or the relational database in association with data from other biological databases.FindingsTo enhance the interoperability of the BMRB database, we present a full conversion of BMRB entries to two standard structured data formats, XML and RDF, as common open representations of the NMR-STAR data. Moreover, a SPARQL endpoint has been deployed. The described case study demonstrates that a simple query of the SPARQL endpoints of the BMRB, UniProt, and Online Mendelian Inheritance in Man (OMIM), can be used in NMR and structure-based analysis of proteins combined with information of single nucleotide polymorphisms (SNPs) and their phenotypes.ConclusionsWe have developed BMRB/XML and BMRB/RDF and demonstrate their use in performing a federated SPARQL query linking the BMRB to other databases through standard semantic web technologies. This will facilitate data exchange across diverse information resources.

[1]  Akira R. Kinjo,et al.  Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format , 2011, Nucleic Acids Res..

[2]  Arash Bahrami,et al.  Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications , 2005, Journal of biomolecular NMR.

[3]  Nick Spadaccini,et al.  Extensions to the STAR File Syntax , 2012, J. Chem. Inf. Model..

[4]  Pinak Chakrabarti,et al.  Quantifying the accessible surface area of protein residues in their local environment. , 2002, Protein engineering.

[5]  Haruki Nakamura,et al.  PDBML: the representation of archival macromolecular structure data in XML , 2005, Bioinform..

[6]  A. Federico,et al.  MECP2 mutation in male patients with non‐specific X‐linked mental retardation , 2000, FEBS letters.

[7]  C. M. Sperberg-McQueen,et al.  W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures , 2012 .

[8]  Haruki Nakamura,et al.  How community has shaped the Protein Data Bank. , 2013, Structure.

[9]  J. Glover,et al.  Structural Consequences of a Cancer-causing BRCA1-BRCT Missense Mutation* , 2003, The Journal of Biological Chemistry.

[10]  P N Barlow,et al.  The solution structure of the domain from MeCP2 that binds to methylated DNA. , 1999, Journal of molecular biology.

[11]  Nick Spadaccini,et al.  The STAR File: detailed specifications , 1994, J. Chem. Inf. Comput. Sci..

[12]  Woonghee Lee,et al.  PACSY, a relational database management system for protein structure and chemical shift analysis , 2012, Journal of Biomolecular NMR.

[13]  Sydney R. Hall,et al.  The STAR file: a new format for electronic data transfer and archiving , 1991, J. Chem. Inf. Comput. Sci..

[14]  Nicolas Le Novère,et al.  Identifiers.org and MIRIAM Registry: community resources to provide persistent identification , 2011, Nucleic Acids Res..

[15]  A. Bird,et al.  MeCP2 binding to DNA depends upon hydration at methyl-CpG. , 2008, Molecular cell.

[16]  Bob DuCharme,et al.  Learning SPARQL , 2013 .

[17]  Oleg Jardetzky,et al.  Probability‐based protein secondary structure identification using combined NMR chemical‐shift data , 2002, Protein science : a publication of the Protein Society.

[18]  Dmitrij Frishman,et al.  STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins , 2004, Nucleic Acids Res..

[19]  Steffen Staab,et al.  RDF Schema , 2020 .

[20]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[21]  Pietro Liò,et al.  The BioMart community portal: an innovative alternative to large, centralized data repositories , 2015, Nucleic Acids Res..

[22]  Bohdan Schneider,et al.  A short survey on protein blocks , 2010, Biophysical Reviews.