BioShaDock: a community driven bioinformatics shared Docker-based tools registry

Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.

[1]  Olivier Sallou,et al.  GO-Docker: A Batch Scheduling System with Docker Containers , 2015, 2015 IEEE International Conference on Cluster Computing.

[2]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[3]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[4]  Morris A. Swertz,et al.  PyPedia: using the wiki paradigm as crowd sourcing environment for bioinformatics protocols , 2015, Source Code for Biology and Medicine.

[5]  Matthew H Todd,et al.  Open science is a research accelerator. , 2011, Nature chemistry.

[6]  Paul Walsh,et al.  Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software , 2015, Bioengineered.

[7]  Edwin Cuppen,et al.  Toward effective software solutions for big biology , 2015, Nature Biotechnology.

[8]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[9]  Pablo Prieto,et al.  The impact of Docker containers on the performance of genomic pipelines , 2015, PeerJ.

[10]  Steffen Mazanek,et al.  SHARE: a web portal for creating and sharing executable research papers , 2011, ICCS.

[11]  Mikel Egaña Aranguren,et al.  Merging OpenLifeData with SADI services using Galaxy and Docker , 2015, bioRxiv.

[12]  Anton Nekrutenko,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[13]  Hilmar Lapp,et al.  Open source tools and toolkits for bioinformatics: significance, and where are we? , 2006, Briefings Bioinform..

[14]  Olivier Sallou,et al.  A curated Domain centric shared Docker registry linked to the Galaxy toolshed , 2015 .

[15]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[16]  Alexander Sczyrba,et al.  Bioboxes: standardised containers for interchangeable bioinformatics software , 2015, GigaScience.

[17]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .