OnTimeURB: Multi-Cloud Resource Brokering for Bioinformatics Workflows

Scientific workflows due to their data and memory intensive requirements are among the prime applications which benefit by leveraging cloud computing. However, Cloud service providers (CSPs) have distinct policies and service dynamics that present a problem of excess choice for users. Performance and cost of the cloud services are among the principal factors in CSP selection for scientific bioinformatics workflows. The workflows typically are based on private data, and require diverse cloud resources, thus often requiring synergistic services from multiple CSPs. In this paper, we address this challenge of multi-cloud resource selection using cloud template solutions based on user specifications. We propose an optimizer that incorporates a combinatorial optimization model built on performance, cost and CSPs interoperability factors. The optimizer is integrated within a novel resource broker (i.e., OnTimeURB) for prescriptive recommendations of template solutions with intuitive choices for users. We implement and evaluate the OnTimeURB recommendations framework with a catalog of bioinformatics workflow applications integrated within a KBCommons science gateway. The evaluation considered four CSP resources featuring more than 300 different machine configuration instances. Our evaluation results show that our OnTimeURB creates consistently more economical, performance optimized and practical cloud solutions compared to a k-nearest neighbors (k-NN) approach.

[1]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[2]  Matthias Klusch,et al.  Fast Composition Planning of OWL-S Services and Application , 2006, 2006 European Conference on Web Services (ECOWS'06).

[3]  Dong Xu,et al.  Knowledge Base Commons (KBCommons) v1.0: A multi OMICS' web-based data integration framework for biological discoveries , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[5]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[6]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[7]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[8]  Chao Xie,et al.  CNV-seq, a new method to detect copy number variation using high-throughput sequencing , 2009, BMC Bioinformatics.

[9]  Chang-Gun Lee,et al.  Enhanced EDF scheduling algorithms for orchestrating network-wide active measurements , 2005, 26th IEEE International Real-Time Systems Symposium (RTSS'05).

[10]  Yang Liu,et al.  PGen: large-scale genomic variations analysis workflow and browser in SoyKB , 2016, BMC Bioinformatics.

[11]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[12]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.