Bookshelf: a simple curation system for the storage of biomolecular simulation data

Molecular dynamics simulations can now routinely generate data sets of several hundreds of gigabytes in size. The ability to generate this data has become easier over recent years and the rate of data production is likely to increase rapidly in the near future. One major problem associated with this vast amount of data is how to store it in a way that it can be easily retrieved at a later date. The obvious answer to this problem is a database. However, a key issue in the development and maintenance of such a database is its sustainability, which in turn depends on the ease of the deposition and retrieval process. Encouraging users to care about meta-data is difficult and thus the success of any storage system will ultimately depend on how well used by end-users the system is. In this respect we suggest that even a minimal amount of metadata if stored in a sensible fashion is useful, if only at the level of individual research groups. We discuss here, a simple database system which we call ‘Bookshelf’, that uses python in conjunction with a mysql database to provide an extremely simple system for curating and keeping track of molecular simulation data. It provides a user-friendly, scriptable solution to the common problem amongst biomolecular simulation laboratories; the storage, logging and subsequent retrieval of large numbers of simulations. Download URL: http://sbcb.bioch.ox.ac.uk/bookshelf/

[1]  Laxmikant V. Kale,et al.  NAMD2: Greater Scalability for Parallel Molecular Dynamics , 1999 .

[2]  Syma Khalid,et al.  Coarse-grained MD simulations of membrane protein-bilayer self-assembly. , 2008, Structure.

[3]  Michael Y. Galperin,et al.  The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources , 2009, Nucleic Acids Res..

[4]  M. Sansom,et al.  Changes in transmembrane helix alignment by arginine residues revealed by solid-state NMR experiments and coarse-grained MD simulations. , 2010, Journal of the American Chemical Society.

[5]  Stuart Murdock,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[6]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[7]  BioSimGrid: A distributed database for the storage and analysis of biomolecular computer simulations , 2005 .

[8]  Valerie Daggett,et al.  Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations. , 2008, Protein engineering, design & selection : PEDS.

[9]  Graham R. Smith,et al.  The nicotinic acetylcholine receptor: from molecular model to single-channel conductance , 2000, European Biophysics Journal.

[10]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[11]  D. van der Spoel,et al.  GROMACS: A message-passing parallel molecular dynamics implementation , 1995 .

[12]  S. Khalid,et al.  Molecular dynamics simulations of a bacterial autotransporter: NalP from Neisseria meningitidis , 2006, Molecular membrane biology.

[13]  Guido van Rossum,et al.  An Introduction to Python , 2003 .

[14]  Kaihsu Tai,et al.  Quality Assurance for Biomolecular Simulations. , 2006, Journal of chemical theory and computation.

[15]  Robert Gentleman,et al.  DATABASE: A new forum for biological databases and curation , 2009, Database J. Biol. Databases Curation.

[16]  Youngjin Choi,et al.  Glyco-MGrid: A Collaborative Molecular Simulation Grid for e-Glycomics , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[17]  Stuart Murdock,et al.  BioSimGrid: towards a worldwide repository for biomolecular simulations. , 2004, Organic & biomolecular chemistry.

[18]  M. Walker,et al.  Mammalian class I myosin, Myo1b, is monomeric and cross-links actin filaments as determined by hydrodynamic studies and electron microscopy. , 2005, Biophysical journal.