Kepler + CometCloud: Dynamic Scientific Workflow Execution on Federated Cloud Resources

The widespread availability and variety of cloud offerings and their associated access models has drastically grown over the past few years. It is now common for users to have access to multiple infrastructures (e.g., campus clusters, cloud resources), however, deploying complex application workflows on top of these resources remains a challenge. In this paper we propose an approach that allows users to build and run scientific workflows on top of a federation of multiple clouds and traditional resources (e.g., clusters). We achieve this by integrating the Kepler scientific workflow platform with the CometCloud framework. This allows us to: 1) dynamically and programmatically provision and aggregate resources, 2) easily compose complex workflows, and 3) dynamically schedule and execute these workflows based on provenance and overall objectives on the resulting federation of resources. We demonstrate our approach and evaluate its capabilities by running a bioinformatics workflow on top of a federation composed of a campus cluster and two clouds.

[1]  Fangzhe Chang,et al.  Optimal Resource Allocation in Clouds , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[2]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[3]  Yu Xie,et al.  Federated Computing for the Masses--Aggregating Resources to Tackle Large-Scale Engineering Problems , 2014, Computing in Science & Engineering.

[4]  Xuejie Zhang,et al.  An Approach to Optimized Resource Scheduling Algorithm for Open-Source Cloud Systems , 2010, 2010 Fifth Annual ChinaGrid Conference.

[5]  Rajkumar Buyya,et al.  Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters , 2009, HPDC '09.

[6]  Xiao Liu,et al.  A Revised Discrete Particle Swarm Optimization for Cloud Workflow Scheduling , 2010, 2010 International Conference on Computational Intelligence and Security.

[7]  P. Varalakshmi,et al.  An Optimal Workflow Based Scheduling and Resource Allocation in Cloud , 2011, ACC.

[8]  Thilo Kielmann,et al.  Bag-of-Tasks Scheduling under Budget Constraints , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[9]  Manish Parashar,et al.  CometCloud: Enabling Software-Defined Federations for End-to-End Application Workflows , 2015, IEEE Internet Computing.

[10]  Rajkumar Buyya,et al.  Workflow scheduling algorithms for grid computing , 2008 .

[11]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[12]  S JayaNirmala,et al.  Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing , 2012 .

[13]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[14]  Xiao Liu,et al.  A market-oriented hierarchical scheduling strategy in cloud workflow systems , 2011, The Journal of Supercomputing.

[15]  Jan Broeckhove,et al.  Cost-Optimal Scheduling in Hybrid IaaS Clouds for Deadline Constrained Workloads , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[17]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[18]  Malgorzata Steinder,et al.  Docker Containers across Multiple Clouds and Data Centers , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).

[19]  Weizhong Li,et al.  Analysis and comparison of very large metagenomes with fast clustering and functional annotation , 2009, BMC Bioinformatics.

[20]  Long Wang,et al.  An Iterative Optimization Framework for Adaptive Workflow Management in Computational Clouds , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[21]  Jianwu Wang,et al.  Big data provenance: Challenges, state of the art and opportunities , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[22]  Rahul Singh,et al.  Data-Driven Workflows in Multi-cloud Marketplaces , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[23]  Xiaorong Li,et al.  Hybrid Heuristic for Scheduling Data Analytics Workflow Applications in Hybrid Cloud Environment , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[24]  Roozbeh Farahbod,et al.  Dynamic Resource Allocation in Computing Clouds Using Distributed Multiple Criteria Decision Analysis , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[25]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.