BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows

In the recent years, the improvement of software and hardware performance has made biomolecular simulations a mature tool for the study of biological processes. Simulation length and the size and complexity of the analyzed systems make simulations both complementary and compatible with other bioinformatics disciplines. However, the characteristics of the software packages used for simulation have prevented the adoption of the technologies accepted in other bioinformatics fields like automated deployment systems, workflow orchestration, or the use of software containers. We present here a comprehensive exercise to bring biomolecular simulations to the “bioinformatics way of working”. The exercise has led to the development of the BioExcel Building Blocks (BioBB) library. BioBB’s are built as Python wrappers to provide an interoperable architecture. BioBB’s have been integrated in a chain of usual software management tools to generate data ontologies, documentation, installation packages, software containers and ways of integration with workflow managers, that make them usable in most computational environments.

[1]  T. Straatsma,et al.  THE MISSING TERM IN EFFECTIVE PAIR POTENTIALS , 1987 .

[2]  Allyson L. Lister,et al.  BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences , 2016, Database J. Biol. Databases Curation.

[3]  Thorsten Meinl,et al.  KNIME-CDK: Workflow-driven cheminformatics , 2013, BMC Bioinformatics.

[4]  Berk Hess,et al.  LINCS: A linear constraint solver for molecular simulations , 1997, J. Comput. Chem..

[5]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[6]  Daniel C. Carter,et al.  THE STRUCTURE OF THE ORTHORHOMBIC FORM OF HEN EGG-WHITE LYSOZYME AT 1.5 ANGSTROMS RESOLUTION , 1997 .

[7]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[8]  Jordi Torres,et al.  PyCOMPSs: Parallel computational workflows in Python , 2016, Int. J. High Perform. Comput. Appl..

[9]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[10]  ariadne staff,et al.  The Information Grid , 2002 .

[11]  Maria De Bonis,et al.  Red blood cell PK deficiency: An update of PK-LR gene mutation database. , 2016, Blood cells, molecules & diseases.

[12]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.

[13]  Klaus Schulten,et al.  QwikMD — Integrative Molecular Dynamics Toolkit for Novices and Experts , 2016, Scientific Reports.

[14]  Modesto Orozco,et al.  Multiscale simulation of DNA. , 2016, Current opinion in structural biology.

[15]  Pradeep Kota GUIMACS - A Java Based Front End for GROMACS , 2007, Silico Biol..

[16]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[17]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[18]  Grant M. Rotskoff,et al.  Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform. , 2015, Journal of chemical theory and computation.

[19]  Hedi Peterson,et al.  Using bio.tools to generate and annotate workbench tool descriptions , 2017, F1000Research.

[20]  G E Stall,et al.  Human erythrocyte pyruvate kinase. , 1975, Methods in enzymology.

[21]  Bernhard Knapp,et al.  jSimMacs for GROMACS: A Java Application for Advanced Molecular Dynamics Simulations with Remote Access Capability , 2009, J. Chem. Inf. Model..

[22]  Cole H. Christie,et al.  Protein Data Bank: the single global archive for 3D macromolecular structure data , 2018, Nucleic acids research.

[23]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .

[24]  Alfonso Valencia,et al.  Interoperability with Moby 1.0--it's better than sharing your toothbrush! , 2008, Briefings in bioinformatics.

[25]  Bernard R. Brooks,et al.  CHARMMing: A New, Flexible Web Portal for CHARMM , 2008, J. Chem. Inf. Model..

[26]  Peter A. Kollman,et al.  AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules , 1995 .

[27]  Modesto Orozco,et al.  A consensus view of protein dynamics , 2007, Proceedings of the National Academy of Sciences.

[28]  Modesto Orozco,et al.  Exploring the suitability of coarse-grained techniques for the representation of protein dynamics. , 2008, Biophysical journal.

[29]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[30]  Zilong Li,et al.  TMB-iBIOMES: A Database of All Atom Simulation and Analysis for Nucleosomes , 2019, Biophysical Journal.

[31]  Jordi Torres,et al.  BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data , 2015, Nucleic Acids Res..

[32]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[33]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[34]  Laxmikant V. Kalé,et al.  NAMD: a Parallel, Object-Oriented Molecular Dynamics Program , 1996, Int. J. High Perform. Comput. Appl..

[35]  Sven Rahmann,et al.  Genome analysis , 2022 .

[36]  Modesto Orozco,et al.  MDWeb and MDMoby: an integrated web-based platform for molecular dynamics simulations , 2012, Bioinform..

[37]  wwPDB consortium,et al.  Protein Data Bank: the single global archive for 3D macromolecular structure data , 2019, Nucleic Acids Res..

[38]  Thomas C. Bishop,et al.  TMB-iBIOMES: An iBIOMES-Lite Database of Nucleosome Trajectories and Meta-Analysis , 2019 .

[39]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[40]  R Dustin Schaeffer,et al.  Dynameomics: a comprehensive database of protein dynamics. , 2010, Structure.

[41]  Modesto Orozco,et al.  A theoretical view of protein dynamics. , 2014, Chemical Society reviews.

[42]  M. Parrinello,et al.  Canonical sampling through velocity rescaling. , 2007, The Journal of chemical physics.

[43]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[44]  Gaurav Kaushik,et al.  Rabix: an open-source workflow executor supporting recomputability and interoperability of workflow descriptions , 2016, bioRxiv.

[45]  Gregory A Voth,et al.  Multiscale modeling of biomolecular systems: in serial and in parallel. , 2007, Current opinion in structural biology.

[46]  Peter M. Kasson,et al.  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..

[47]  M. Parrinello,et al.  Polymorphic transitions in single crystals: A new molecular dynamics method , 1981 .

[48]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[49]  Alexander D. MacKerell,et al.  CHARMM‐GUI 10 years for biomolecular modeling and simulation , 2017, J. Comput. Chem..

[50]  Alexander Barg,et al.  Serial and Parallel , 2003 .

[51]  Modesto Orozco,et al.  MoDEL (Molecular Dynamics Extended Library): a database of atomistic molecular dynamics trajectories. , 2010, Structure.

[52]  Carlos González,et al.  NAFlex: a web server for the study of nucleic acid flexibility , 2013, Nucleic Acids Res..

[53]  Dimitrios Vlachakis,et al.  Gromita: A Fully Integrated Graphical User Interface to Gromacs 4 , 2009, Bioinformatics and biology insights.

[54]  Adam Hospital,et al.  High‐throughput molecular dynamics simulations: toward a dynamic view of macromolecular structure , 2013 .

[55]  Julio C. Facelli,et al.  iBIOMES: Managing and Sharing Biomolecular Simulation Data in a Distributed Environment , 2013, J. Chem. Inf. Model..

[56]  Pablo D. Dans,et al.  Modeling, Simulations, and Bioinformatics at the Service of RNA Structure , 2019, Chem.

[57]  Julio C. Facelli,et al.  iBIOMES Lite: Summarizing Biomolecular Simulation Data in Limited Settings , 2014, J. Chem. Inf. Model..

[58]  Arne Elofsson,et al.  Ten simple rules on how to create open access and reproducible molecular simulations of biological systems , 2019, PLoS Comput. Biol..

[59]  Björn Grüning,et al.  ReGaTE: Registration of Galaxy Tools in Elixir , 2017, GigaScience.

[60]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[61]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..