Hermes: Seamless delivery of containerized bioinformatics workflows in hybrid cloud (HTC) environments

Abstract Hermes introduces a new “describe once, run anywhere” paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.

[1]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[2]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[3]  Hideaki Sugawara,et al.  DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data , 2013, DNA research : an international journal for rapid publication of reports on genes and genomes.

[4]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[5]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[6]  Gunnar Rätsch,et al.  Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis , 2014, Bioinform..

[7]  M. Zaharia,et al.  A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples , 2014, Genome Research.

[8]  Simon White,et al.  Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline , 2014, BMC Bioinformatics.

[9]  Zechen Chong,et al.  Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads , 2012, Bioinform..

[10]  Jo Handelsman,et al.  Metagenomics or Megagenomics? , 2005, Nature Reviews Microbiology.

[11]  Peter J. Tonellato,et al.  Cloud computing for comparative genomics , 2010, BMC Bioinformatics.

[12]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[13]  Ben Langmead,et al.  Genotyping in the Cloud with Crossbow , 2012, Current protocols in bioinformatics.

[14]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[15]  Pericles A. Mitkas,et al.  Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profiles , 2013, PloS one.

[16]  Jonathan Pevsner,et al.  Basic Local Alignment Search Tool (BLAST) , 2005 .

[17]  David R. Riley,et al.  Ten years of pan-genome analyses. , 2015, Current opinion in microbiology.

[18]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[19]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[20]  Anton Nekrutenko,et al.  Galaxy CloudMan: delivering cloud compute clusters , 2010, BMC Bioinformatics.

[21]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[22]  Borja Sotomayor,et al.  Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses , 2014, J. Biomed. Informatics.

[23]  Daniel Blankenberg,et al.  CloudMap: A Cloud-Based Pipeline for Analysis of Mutant Genome Sequences , 2012, Genetics.