Skyport - Container-Based Execution Environment Management for Multi-cloud Scientific Workflows

Recently, Linux container technology has been gaining attention as it promises to transform the way software is developed and deployed. The portability and ease of deployment makes Linux containers an ideal technology to be used in scientific workflow platforms. Skyport utilizes Docker Linux containers to solve software deployment problems and resource utilization inefficiencies inherent to all existing scientific workflow platforms. As an extension to AWE/Shock, our data analysis platform that provides scalable workflow execution environments for scientific data in the cloud, Skyport greatly reduces the complexity associated with providing the environment necessary to execute complex workflows.

[1]  Johan Montagnat,et al.  Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR , 2008, Int. J. High Perform. Comput. Appl..

[2]  Ian T. Foster,et al.  Experiences in building a next-generation sequencing analysis service using galaxy, globus online and Amazon web service , 2013, XSEDE.

[3]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[4]  Anton Nekrutenko,et al.  Galaxy CloudMan: delivering cloud compute clusters , 2010, BMC Bioinformatics.

[5]  Ulf Leser,et al.  Parallelization in Scientific Workflow Management Systems , 2013, ArXiv.

[6]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[7]  Dawn Field,et al.  Open software for biologists: from famine to feast , 2006, Nature Biotechnology.

[8]  Ramakrishnan Rajamony,et al.  An updated performance comparison of virtual machines and Linux containers , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[9]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[10]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[11]  Andreas Wilke,et al.  A metagenomics portal for a democratized sequencing world. , 2013, Methods in enzymology.

[12]  Konstantinos Krampis,et al.  Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community , 2012, BMC Bioinformatics.

[13]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[14]  Andreas Wilke,et al.  A scalable data analysis platform for metagenomics , 2013, 2013 IEEE International Conference on Big Data.

[15]  Jianwu Wang,et al.  Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems , 2009, WORKS '09.

[16]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[17]  Moustafa Ghanem,et al.  Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support , 2012, BMC Bioinformatics.

[18]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[19]  Daniel S. Katz,et al.  A Workflow-Aware Storage System: An Opportunity Study , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[20]  J. Gilbert,et al.  Metagenomics - a guide from sampling to data analysis , 2012, Microbial Informatics and Experimentation.

[21]  Jianwu Wang,et al.  Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper , 2012, EDBT-ICDT '12.

[22]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[23]  Jianwu Wang,et al.  A Framework for Distributed Data-Parallel Execution in the Kepler Scientific Workflow System , 2012, ICCS.