End-to-End Delay Minimization for Scientific Workflows in Clouds under Budget Constraint

Next-generation e-Science features large-scale, compute-intensive workflows of many computing modules that are typically executed in a distributed manner. With the recent emergence of cloud computing and the rapid deployment of cloud infrastructures, an increasing number of scientific workflows have been shifted or are in active transition to cloud environments. As cloud computing makes computing a utility, scientists across different application domains are facing the same challenge of reducing financial cost in addition to meeting the traditional goal of performance optimization. We develop a prototype generic workflow system by leveraging existing technologies for a quick evaluation of scientific workflow optimization strategies. We construct analytical models to quantify the network performance of scientific workflows using cloud-based computing resources, and formulate a task scheduling problem to minimize the workflow end-to-end delay under a user-specified financial constraint. We rigorously prove that the proposed problem is not only NP-complete but also non-approximable. We design a heuristic solution to this problem, and illustrate its performance superiority over existing methods through extensive simulations and real-life workflow experiments based on proof-of-concept implementation and deployment in a local cloud testbed.

[1]  R. Buyya,et al.  A budget constrained scheduling of workflow applications on utility Grids using genetic algorithms , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[2]  Dharma P. Agrawal,et al.  Improving scheduling of tasks in a heterogeneous environment , 2004, IEEE Transactions on Parallel and Distributed Systems.

[3]  Rajkumar Buyya,et al.  A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Rizos Sakellariou,et al.  A hybrid heuristic for DAG scheduling on heterogeneous systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Rajkumar Buyya,et al.  Deadline/Budget‐Based Scheduling of Workflows on Utility Grids , 2009 .

[7]  Osamu Tatebe,et al.  Workflow Scheduling to Minimize Data Movement Using Multi-constraint Graph Partitioning , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[8]  Xiaorong Li,et al.  ScaleStar: Budget Conscious Scheduling Precedence-Constrained Many-task Workflow Applications in Cloud , 2012, 2012 IEEE 26th International Conference on Advanced Information Networking and Applications.

[9]  Marios D. Dikaiakos,et al.  Scheduling Workflows with Budget Constraints , 2007, Grid 2007.

[10]  Chase Qishi Wu,et al.  On Scientific Workflow Scheduling in Clouds under Budget Constraint , 2013, 2013 42nd International Conference on Parallel Processing.

[11]  Chase Qishi Wu,et al.  Optimizing Distributed Computing Workflows in Heterogeneous Network Environments , 2010, ICDCN.

[12]  Rizos Sakellariou,et al.  Budget-Deadline Constrained Workflow Planning for Admission Control , 2013, Journal of Grid Computing.

[13]  Chase Qishi Wu,et al.  Optimizing end-to-end performance of data-intensive computing pipelines in heterogeneous network environments , 2011, J. Parallel Distributed Comput..

[14]  Hamid Arabnejad,et al.  A Budget Constrained Scheduling Algorithm for Workflow Applications , 2014, Journal of Grid Computing.

[15]  G. Bruce Berriman,et al.  Data Sharing Options for Scientific Workflows on Amazon EC2 , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[17]  Jie Li,et al.  Cloud auto-scaling with deadline and budget constraints , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[18]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[19]  Chase Qishi Wu,et al.  Analyzing Execution Dynamics of Scientific Workflows for Latency Minimization in Resource Sharing Environments , 2011, 2011 IEEE World Congress on Services.

[20]  Miron Livny,et al.  The cost of doing science on the cloud: The Montage example , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Rajkumar Buyya,et al.  Cost-based scheduling of scientific workflow applications on utility grids , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[22]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[23]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[24]  G. Powers,et al.  A Description of the Advanced Research WRF Version 3 , 2008 .

[25]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[26]  Thomas J. Hacker,et al.  Flexible resource allocation for reliable virtual cluster computing systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[28]  Marta Mattoso,et al.  Towards a Cost Model for Scheduling Scientific Workflows Activities in Cloud Environments , 2011, 2011 IEEE World Congress on Services.

[29]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[30]  Mahmoud Naghibzadeh,et al.  Deadline-constrained workflow scheduling in software as a service Cloud , 2012, Sci. Iran..

[31]  R. Buyya,et al.  Market-Oriented Grid and Utility Computing , 2009 .

[32]  Ewa Deelman,et al.  Integration of Workflow Partitioning and Resource Provisioning , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[33]  Joonsoo Bae,et al.  Workflow Clustering Method Based on Process Similarity , 2006, ICCSA.

[34]  Daniel S. Katz,et al.  Workflow task clustering for best effort systems with Pegasus , 2008, Mardi Gras Conference.

[35]  Chao Tian,et al.  Nova: continuous Pig/Hadoop workflows , 2011, SIGMOD '11.