BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data

Molecular dynamics simulation (MD) is, just behind genomics, the bioinformatics tool that generates the largest amounts of data, and that is using the largest amount of CPU time in supercomputing centres. MD trajectories are obtained after months of calculations, analysed in situ, and in practice forgotten. Several projects to generate stable trajectory databases have been developed for proteins, but no equivalence exists in the nucleic acids world. We present here a novel database system to store MD trajectories and analyses of nucleic acids. The initial data set available consists mainly of the benchmark of the new molecular dynamics force-field, parmBSC1. It contains 156 simulations, with over 120 μs of total simulation time. A deposition protocol is available to accept the submission of new trajectory data. The database is based on the combination of two NoSQL engines, Cassandra for storing trajectories and MongoDB to store analysis results and simulation metadata. The analyses available include backbone geometries, helical analysis, NMR observables and a variety of mechanical analyses. Individual trajectories and combined meta-trajectories can be downloaded from the portal. The system is accessible through http://mmb.irbbarcelona.org/BIGNASim/. Supplementary Material is also available on-line at http://mmb.irbbarcelona.org/BIGNASim/SuppMaterial/.

[1]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[2]  R Dustin Schaeffer,et al.  Dynameomics: a comprehensive database of protein dynamics. , 2010, Structure.

[3]  David A. Case,et al.  μABC: a systematic microsecond molecular dynamics study of tetranucleotide sequence effects in B-DNA , 2014, Nucleic acids research.

[4]  Heinz Sklenar,et al.  Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides. I. Research design and results on d(CpG) steps. , 2004, Biophysical journal.

[5]  Helmut Grubmüller,et al.  do_x3dna: a tool to analyze structural fluctuations of dsDNA or dsRNA from molecular dynamics simulations , 2015, Bioinform..

[6]  F. J. Luque,et al.  Data Mining of Molecular Dynamics Trajectories of Nucleic Acids , 2006, Journal of biomolecular structure & dynamics.

[7]  Modesto Orozco,et al.  MoDEL (Molecular Dynamics Extended Library): a database of atomistic molecular dynamics trajectories. , 2010, Structure.

[8]  Carlos González,et al.  NAFlex: a web server for the study of nucleic acid flexibility , 2013, Nucleic Acids Res..

[9]  J Langowski,et al.  Sequence-dependent elastic properties of DNA. , 2000, Journal of molecular biology.

[10]  Julio C. Facelli,et al.  Data model, dictionaries, and desiderata for biomolecular simulation data indexing and sharing , 2014, Journal of Cheminformatics.

[11]  Xiang-Jun Lu,et al.  3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures , 2008, Nature Protocols.

[12]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[13]  Christophe Blanchet,et al.  CURVES+ web server for analyzing and visualizing the helical, backbone and groove parameters of nucleic acid structures , 2011, Nucleic Acids Res..

[14]  Julio C. Facelli,et al.  iBIOMES Lite: Summarizing Biomolecular Simulation Data in Limited Settings , 2014, J. Chem. Inf. Model..

[15]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[16]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  D. Case,et al.  PARMBSC1: A REFINED FORCE-FIELD FOR DNA SIMULATIONS , 2015, Nature Methods.

[18]  Oliver Beckstein,et al.  MDAnalysis: A toolkit for the analysis of molecular dynamics simulations , 2011, J. Comput. Chem..

[19]  Julio C. Facelli,et al.  iBIOMES: Managing and Sharing Biomolecular Simulation Data in a Distributed Environment , 2013, J. Chem. Inf. Model..

[20]  John D. Westbrook,et al.  The Nucleic Acid Database: new features and capabilities , 2013, Nucleic Acids Res..

[21]  G. Vriend,et al.  Exploring Protein Dynamics Space: The Dynasome as the Missing Link between Protein Structure and Function , 2012, PloS one.

[22]  Modesto Orozco,et al.  MDWeb and MDMoby: an integrated web-based platform for molecular dynamics simulations , 2012, Bioinform..

[23]  J. H. Maddocks,et al.  Conformational analysis of nucleic acids revisited: Curves+ , 2009, Nucleic acids research.

[24]  Valerie Daggett,et al.  Dynameomics: a multi-dimensional analysis-optimized database for dynamic protein data. , 2008, Protein engineering, design & selection : PEDS.