An Experimental Study of Data Transfer Strategies for Execution of Scientific Workflows

The paper studies the impact of data transfer strategies on the execution of scientific workflows. Five strategies are described, which define when and in what order data transfers are performed during the workflow execution. The strategies are experimentally evaluated by means of simulation using a realistic network model. It is demonstrated that the execution time of data-intensive workflows significantly depends on the used strategy. In particular, Eager and Lazy strategies, often used in theory and practice of workflow scheduling, demonstrate the poor results in most cases. The alternative strategies provide up to 36% makespan improvement by overlapping communications and computations, prioritizing data transfers and reducing network contention.

[1]  Ying Zhang,et al.  A Data Placement Strategy for Scientific Workflow in Hybrid Cloud , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[2]  Shishir Bharathi,et al.  Data Staging Strategies and Their Impact on the Execution of Scientific Workflows , 2009, DADC '09.

[3]  Mei-Hui Su,et al.  Characterization of scientific workflows , 2008, 2008 Third Workshop on Workflows in Support of Large-Scale Science.

[4]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[5]  Henri Casanova,et al.  On the validity of flow-level tcp network models for grid and cloud simulations , 2013, TOMC.

[6]  Quan Z. Sheng,et al.  Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows , 2013, Journal of Grid Computing.

[7]  Rajkumar Buyya,et al.  Workflow scheduling algorithms for grid computing , 2008 .

[8]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[9]  Qingbo Wu,et al.  Workflow scheduling in cloud: a survey , 2015, The Journal of Supercomputing.

[10]  Jin-Soo Kim,et al.  Cost optimized provisioning of elastic resources for application workflows , 2011, Future Gener. Comput. Syst..

[11]  Oleg Sukhoroslov,et al.  An Experimental Study of Workflow Scheduling Algorithms for Heterogeneous Systems , 2017, PaCT.

[12]  Malcolm P. Atkinson,et al.  Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows , 2016, WORKS@SC.

[13]  Rajkumar Buyya,et al.  A Particle Swarm Optimization-Based Heuristic for Scheduling Workflow Applications in Cloud Computing Environments , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[14]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..

[15]  Luan Teylo,et al.  A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds , 2017, Future Gener. Comput. Syst..

[16]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[17]  Ann L. Chervenak,et al.  Characterizing and profiling scientific workflows , 2013, Future Gener. Comput. Syst..

[18]  Dick H. J. Epema,et al.  Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds , 2013, Future Gener. Comput. Syst..

[19]  Junzhou Luo,et al.  Data Placement and Task Scheduling Optimization for Data Intensive Scientific Workflow in Multiple Data Centers Environment , 2014 .

[20]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[21]  Ewa Deelman,et al.  Storage-aware Algorithms for Scheduling of Workflow Ensembles in Clouds , 2015, Journal of Grid Computing.

[22]  Bora Uçar,et al.  Integrated data placement and task assignment for scientific workflows in clouds , 2011, DIDC '11.

[23]  Marta Mattoso,et al.  A Survey of Data-Intensive Scientific Workflow Management , 2015, Journal of Grid Computing.