Bio-Docklets: Virtualization Containers for Single-Step Execution of NGS Pipelines

Background Processing of Next-Generation Sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized post-analysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers, towards seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. Findings We present an approach for abstracting the complex data operations of multi-step, bioinformatics pipelines for NGS data analysis. As examples, we have deployed two pipelines for RNAseq and CHIPseq, pre-configured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines is as simple as running a single bioinformatics tool. This is achieved through a “meta-script” that automatically starts the Bio-Docklets, and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface (API). The pipelne output is post-processed using the Visual Omics Explorer (VOE) framework, providing interactive data visualizations that users can access through a web browser. Conclusions The goal of our approach is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts, on any computing environment whether a laboratory workstation, university computer cluster, or a cloud service provider,. Besides end-users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.

[1]  Gianmauro Cuccuru,et al.  BioBlend.objects: metacomputing with Galaxy , 2014, Bioinform..

[2]  Pablo Prieto,et al.  The impact of Docker containers on the performance of genomic pipelines , 2015, PeerJ.

[3]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[4]  Anton Nekrutenko,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[5]  Richard Dobson,et al.  NGSeasy: a next generation sequencing pipeline in Docker containers , 2015 .

[6]  Reinhard C. Laubenbacher,et al.  AlgoRun: a Docker-based packaging system for platform-agnostic implemented algorithms , 2016, Bioinform..

[7]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[8]  Konstantinos Krampis,et al.  Visual Omics Explorer (VOE): a cross-platform portal for interactive data visualization , 2016, Bioinform..

[9]  Hui-Wen Chang,et al.  ATPase family AAA domain containing 3A is an anti-apoptotic factor and a secretion regulator of PSA in prostate cancer. , 2011, International journal of molecular medicine.

[10]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[11]  William J Catalona,et al.  Genetics of prostate cancer. , 2003, Clinical medicine & research.

[12]  Konstantinos Krampis,et al.  Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community , 2012, BMC Bioinformatics.

[13]  Olivier Sallou,et al.  BioShaDock: a community driven bioinformatics shared Docker-based tools registry , 2015, F1000Research.

[14]  Enis Afgan,et al.  CloudMan as a platform for tool, data, and analysis distribution , 2012, BMC Bioinformatics.

[15]  Rebecca F. Halperin,et al.  GuiTope: an application for mapping random-sequence peptides to protein sequences , 2012, BMC Bioinformatics.

[16]  Ka Yee Yeung,et al.  Building containerized workflows using the BioDepot-workflow-builder (Bwb) , 2017, bioRxiv.

[17]  Ka Yee Yeung,et al.  GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research , 2016, PloS one.

[18]  Alexander Sczyrba,et al.  Bioboxes: standardised containers for interchangeable bioinformatics software , 2015, GigaScience.