Using bio.tools to generate and annotate workbench tool descriptions

Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata.

[1]  Jon C. Ison,et al.  Using registries to integrate bioinformatics tools and services into workbench environments , 2016, International Journal on Software Tools for Technology Transfer.

[2]  Daniel J. Blankenberg,et al.  Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists , 2010, Current protocols in molecular biology.

[3]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[4]  N. Blomberg,et al.  General guidelines for biomedical software development , 2017, F1000Research.

[5]  Björn Grüning,et al.  ReGaTE: Registration of Galaxy Tools in Elixir , 2017, GigaScience.

[6]  Rolf Backofen,et al.  Practical computational reproducibility in the life sciences , 2017 .

[7]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[8]  Benedict Paten,et al.  The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows , 2017, F1000Research.

[9]  Daniel S. Katz,et al.  Four simple recommendations to encourage best practices in research software , 2017, F1000Research.

[10]  Olivier Sallou,et al.  BioShaDock: a community driven bioinformatics shared Docker-based tools registry , 2015, F1000Research.

[11]  Hervé Ménager,et al.  bio-tools/ToolDog: v0.3.4 for F1000 submission , 2017 .

[12]  Olivia Doppelt-Azeroual,et al.  A public Galaxy platform at Pasteur used as an execution engine for web services , 2017 .

[13]  Yichao Zhou,et al.  AZTEC: A Cloud-based Computational Platform to Integrate Biomedical Resources , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[14]  Vincent J. Henry,et al.  OMICtools: an informative directory for multi-omic data analysis , 2014, Database J. Biol. Databases Curation.

[15]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[16]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[17]  Carole A. Goble,et al.  BioCatalogue: a universal catalogue of web services for the life sciences , 2010, Nucleic Acids Res..

[18]  Mikko Koski,et al.  Chipster: user-friendly analysis software for microarray and other high-throughput data , 2011, BMC Genomics.

[19]  M. Touchon,et al.  Identification and analysis of integrons and cassette arrays in bacterial genomes , 2015, bioRxiv.

[20]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[21]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[22]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[23]  Kenzo-Hugo Hillion,et al.  khillion/galaxyxml-analysis: v1.0.2 for F1000 submission , 2017 .

[24]  Carole A. Goble,et al.  Community-driven computational biology with Debian Linux , 2010, BMC Bioinformatics.

[25]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[26]  Rafael C. Jimenez,et al.  Top 10 metrics for life science software good practices , 2016, F1000Research.