Deploying Bioinformatics Workflows on Clouds with Galaxy and Globus Provision

Cloud computing is attracting increasing attention as a means of providing users with fast provisioning of computational and storage resources, elastic scaling, and payas-you-go pricing. The integration of scientific workflows and Cloud computing has the potential to significantly improve resource utilization, processing speed, and user experience. This paper proposes a novel approach for deploying bioinformatics workflows in Cloud environments using Galaxy, a platform for scientific workflows, and Globus Provision, a tool for deploying distributed computing clusters on Amazon EC2. Collectively this combination of tools provides an easy to use, high performance and scalable workflow environment that addresses the needs of data-intensive applications through dynamic cluster configuration, automatic user-defined node provisioning, high speed data transfer, and automated deployment and configuration of domain-specific software. To demonstrate how this approach can be used in practice we present a domain-specific workflow use case and performance evaluation.

[1]  Bernd Freisleben,et al.  On-Demand Resource Provisioning for BPEL Workflows Using Amazon's Elastic Compute Cloud , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[2]  Promise Mvelase,et al.  An architecture based on SOA and virtual enterprise principles: OpenNebula for cloud deployment , 2012 .

[3]  Ian T. Foster,et al.  Globus Online: Accelerating and Democratizing Science through Cloud-Based Services , 2011, IEEE Internet Computing.

[4]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[5]  Ian T. Foster,et al.  Software as a service for data scientists , 2012, Commun. ACM.

[6]  Yogesh Simmhan,et al.  Building the Trident Scientific Workflow Workbench for Data Management in the Cloud , 2009, 2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences.

[7]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[8]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[9]  Paul Marshall,et al.  Architecting a Large-scale Elastic Environment - Recontextualization and Adaptive Cloud Services for Scientific Computing , 2012, ICSOFT.

[10]  Yong Zhao,et al.  Scientific Workflow Systems for 21st Century, New Bottle or New Wine? , 2008, 2008 IEEE Congress on Services - Part I.

[11]  Jim Basney,et al.  The MyProxy online credential repository , 2005, Softw. Pract. Exp..

[12]  Hans De Sterck,et al.  CloudWF: A Computational Workflow System for Clouds Based on Hadoop , 2009, CloudCom.

[13]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[14]  G. Bruce Berriman,et al.  Scientific workflow applications on Amazon EC2 , 2010, 2009 5th IEEE International Conference on E-Science Workshops.

[15]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[16]  Miron Livny,et al.  Distributed computing in practice: the Condor experience: Research Articles , 2005 .

[17]  Anton Nekrutenko,et al.  Galaxy CloudMan: delivering cloud compute clusters , 2010, BMC Bioinformatics.

[18]  Carole A. Goble,et al.  CaGrid Workflow Toolkit: A taverna based workflow tool for cancer grid , 2010, BMC Bioinformatics.

[19]  Xiao Liu,et al.  A market-oriented hierarchical scheduling strategy in cloud workflow systems , 2011, The Journal of Supercomputing.

[20]  Xiao Liu,et al.  A cost-effective strategy for intermediate data storage in scientific cloud workflow systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[21]  Michael I. Miller,et al.  The CardioVascular Research Grid ( CVRG ) Project , 2012 .

[22]  Ian T. Foster,et al.  Globus GridFTP: what's new in 2007 , 2007, GridNets '07.