Sharing Data from Molecular Simulations

Given the need for modern researchers to produce open, reproducible scientific output, the lack of standards and best practices for sharing data and workflows used to produce and analyze molecular dynamics (MD) simulations have become an important issue in the field. There are now multiple well-established packages to perform molecular dynamics simulations, often highly tuned for exploiting specific classes of hardware, and each with strong communities surrounding them, but with very limited interoperability/transferability options. Thus, the choice of the software package often dictates the workflow for both simulation production and analysis. The level of detail in documenting the workflows and analysis code varies greatly in published work, hindering reproducibility of the reported results and the ability for other researchers to build on these studies. An increasing number of researchers are motivated to make their data available, but many challenges remain in order to effectively share and reuse simulation data. To discuss these and other issues related to best practices in the field in general, we organized a workshop in November 2018 ( https://bioexcel.eu/events/workshop-on-sharing-data-from-molecular-simulations/ ). Here, we present a brief overview of this workshop and topics discussed. We hope this effort will spark further conversation in the MD community to pave the way towards more open, interoperable and reproducible outputs coming from research studies using MD simulations.

[1]  Diwakar Shukla,et al.  OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation. , 2013, Journal of chemical theory and computation.

[2]  Thomas J Lane,et al.  MDTraj: a modern, open library for the analysis of molecular dynamics trajectories , 2014, bioRxiv.

[3]  Ross C. Walker,et al.  An overview of the Amber biomolecular simulation package , 2013 .

[4]  Carsten Kutzner,et al.  Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS , 2015, EASC.

[5]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[6]  Tamar Schlick,et al.  Scaling molecular dynamics beyond 100,000 processor cores for large‐scale biophysical simulations , 2019, J. Comput. Chem..

[7]  N. Hawkins,et al.  Data sharing in genomics — re-shaping scientific practice , 2009, Nature Reviews Genetics.

[8]  Ross C. Walker,et al.  CHAMBER: Comprehensive support for CHARMM force fields within the AMBER software , 2009 .

[9]  Tony Yuen,et al.  Reply to Graham et al.: In silico atomistic coordinates and molecular dynamics simulation trajectories of the glucocerebrosidase–saposin C complex , 2019, Proceedings of the National Academy of Sciences.

[10]  John E. Stone,et al.  TopoGromacs: Automated Topology Conversion from CHARMM to GROMACS within VMD , 2016, J. Chem. Inf. Model..

[11]  Julien Michel,et al.  Reproducibility of Free Energy Calculations across Different Molecular Simulation Software Packages. , 2018, Journal of chemical theory and computation.

[12]  Klaus Schulten,et al.  GPU-accelerated molecular modeling coming of age. , 2010, Journal of molecular graphics & modelling.

[13]  P E Bourne,et al.  Macromolecular Crystallographic Information File. , 1997, Methods in enzymology.

[14]  Oliver Beckstein,et al.  Lipidbook: A Public Repository for Force-Field Parameters Used in Membrane Simulations , 2010, The Journal of Membrane Biology.

[15]  W. Patrick Walters,et al.  Modeling, Informatics, and the Quest for Reproducibility , 2013, J. Chem. Inf. Model..

[16]  Alan Grossfield,et al.  Best Practices for Quantification of Uncertainty and Sampling Quality in Molecular Simulations [Article v1.0]. , 2018, Living journal of computational molecular science.

[17]  Akira R. Kinjo,et al.  Molmil: a molecular viewer for the PDB and beyond , 2016, Journal of Cheminformatics.

[18]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[19]  Nohad Gresh,et al.  Tinker-HP: a massively parallel molecular dynamics package for multiscale simulations of large complex systems with advanced point dipole polarizable force fields† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc04531j , 2017, Chemical science.

[20]  Data sharing and the future of science , 2018, Nature Communications.

[21]  Jean-Philip Piquemal,et al.  Tinker 8: Software Tools for Molecular Design. , 2018, Journal of chemical theory and computation.

[22]  Anna L. Duncan,et al.  Molecular dynamics simulations of membrane proteins and their interactions: from nanoscale to mesoscale , 2016, Current opinion in structural biology.

[23]  Oliver Beckstein,et al.  MDAnalysis: A toolkit for the analysis of molecular dynamics simulations , 2011, J. Comput. Chem..

[24]  Joseph E. Goose,et al.  MemProtMD: Automated Insertion of Membrane Protein Structures into Explicit Lipid Membranes , 2015, Structure.

[25]  Alexander S. Rose,et al.  Bringing Molecular Dynamics Simulation Data into View. , 2019, Trends in biochemical sciences.

[26]  Alexander S. Rose,et al.  MDsrv: viewing and sharing molecular dynamics simulations on the web , 2017, Nature Methods.

[27]  Erik Lindahl,et al.  An efficient and extensible format, library, and API for binary trajectory data from molecular simulations , 2014, J. Comput. Chem..

[28]  Frank Noé,et al.  HTMD: High-Throughput Molecular Dynamics for Molecular Discovery. , 2016, Journal of chemical theory and computation.

[29]  Daniel R Roe,et al.  PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. , 2013, Journal of chemical theory and computation.

[30]  Hans Ekkehard Plesser,et al.  Reproducibility vs. Replicability: A Brief History of a Confused Terminology , 2018, Front. Neuroinform..

[31]  Anthony R. Bradley,et al.  MMTF - an efficient file format for the transmission, visualization, and analysis of macromolecular structures , 2017 .

[32]  Oliver Beckstein,et al.  MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations , 2016, SciPy.

[33]  Alan Grossfield,et al.  Lightweight object oriented structure analysis: Tools for building tools to analyze molecular dynamics simulations , 2014, J. Comput. Chem..

[34]  Mauricio Carrillo-Tripp,et al.  HTMoL: full-stack solution for remote access, visualization, and analysis of molecular dynamics trajectory data , 2017, Journal of Computer-Aided Molecular Design.

[35]  Wonpil Im CHARMM-GUI 10 Years for Biomolecular Modeling and Simulation , 2016 .

[36]  Genji Kurisu,et al.  PDB-Dev: a Prototype System for Depositing Integrative/Hybrid Structural Models. , 2017, Structure.

[37]  Matthew H Todd,et al.  Open science is a research accelerator. , 2011, Nature chemistry.

[38]  Kresten Lindorff-Larsen,et al.  Biophysical experiments and biomolecular simulations: A perfect match? , 2018, Science.

[39]  Stephen C. Graham,et al.  Molecular models should not be published without the corresponding atomic coordinates , 2019, Proceedings of the National Academy of Sciences.

[40]  Bogdan I. Iorga,et al.  Ligandbook: an online repository for small and drug-like molecule force field parameters , 2017, Bioinform..

[41]  Daniel M. Zuckerman,et al.  Best Practices for Quantification of Uncertainty and Sampling Quality in Molecular Simulations , 2018 .

[42]  Jian Yin,et al.  Erratum to: Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset , 2016, bioRxiv.

[43]  Jakub Rydzewski,et al.  Promoting transparency and reproducibility in enhanced molecular simulations , 2019, Nature Methods.

[44]  Alan Grossfield,et al.  LOOS: An extensible platform for the structural analysis of simulations , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[45]  Modesto Orozco,et al.  MoDEL (Molecular Dynamics Extended Library): a database of atomistic molecular dynamics trajectories. , 2010, Structure.

[46]  Jie Liang,et al.  Challenges in structural approaches to cell modeling. , 2016, Journal of molecular biology.

[47]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[48]  Lennart Martens,et al.  A Golden Age for Working with Public Proteomics Data , 2017, Trends in biochemical sciences.

[49]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[50]  Lorena A. Barba,et al.  Terminologies for Reproducible Research , 2018, ArXiv.

[51]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[52]  Haruki Nakamura,et al.  The Protein Data Bank at 40: reflecting on the past to prepare for the future. , 2012, Structure.

[53]  Alexander D. MacKerell,et al.  CHARMM-GUI Input Generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM Simulations Using the CHARMM36 Additive Force Field , 2015, Journal of chemical theory and computation.

[54]  David L. Mobley,et al.  Why we need the Living Journal of Computational Molecular Science , 2017 .

[55]  Konrad Hinsen,et al.  ActivePapers: a platform for publishing and archiving computer-aided research. , 2014, F1000Research.

[56]  Arne Elofsson,et al.  Ten simple rules on how to create open access and reproducible molecular simulations of biological systems , 2019, PLoS Comput. Biol..

[57]  Jordi Torres,et al.  BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data , 2015, Nucleic Acids Res..

[58]  A. Lyubartsev,et al.  Toward Atomistic Resolution Structure of Phosphatidylcholine Headgroup and Glycerol Backbone at Different Ambient Conditions† , 2015, The journal of physical chemistry. B.

[59]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[60]  Helgi I. Ingólfsson,et al.  Computational Modeling of Realistic Cell Membranes , 2019, Chemical reviews.