WRENCH: A Framework for Simulating Workflow Management Systems

Scientific workflows are used routinely in numerous scientific domains, and Workflow Management Systems (WMSs) have been developed to orchestrate and optimize workflow executions on distributed platforms. WMSs are complex software systems that interact with complex software infrastructures. Most WMS research and development activities rely on empirical experiments conducted with full-fledged software stacks on actual hardware platforms. Such experiments, however, are limited to hardware and software infrastructures at hand and can be labor- and/or time-intensive. As a result, relying solely on real-world experiments impedes WMS research and development. An alternative is to conduct experiments in simulation. In this work we present WRENCH, a WMS simulation framework, whose objectives are (i) accurate and scalable simulations; and (ii) easy simulation software development. WRENCH achieves its first objective by building on the SimGrid framework. While SimGrid is recognized for the accuracy and scalability of its simulation models, it only provides low-level simulation abstractions and thus large software development efforts are required when implementing simulators of complex systems. WRENCH thus achieves its second objective by providing high- level and directly re-usable simulation abstractions on top of SimGrid. After describing and giving rationales for WRENCH’s software architecture and APIs, we present a case study in which we apply WRENCH to simulate the Pegasus production WMS. We report on ease of implementation, simulation accuracy, and simulation scalability so as to determine to which extent WRENCH achieves its two above objectives. We also draw both qualitative and quantitative comparisons with a previously proposed workflow simulator.

[1]  Ewa Deelman,et al.  Dynamic and Fault-Tolerant Clustering for Scientific Workflows , 2016, IEEE Transactions on Cloud Computing.

[2]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[3]  Sasko Ristov,et al.  Simulation of a workflow execution as a real Cloud by adding noise , 2017, Simul. Model. Pract. Theory.

[4]  Shantenu Jha,et al.  A Comprehensive Perspective on Pilot-Job Systems , 2015, ACM Comput. Surv..

[5]  Arnaud Legrand,et al.  Adding Storage Simulation Capacities to the SimGrid Toolkit: Concepts, Models, and API , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[6]  Harold Enrique Castro Barrera,et al.  Desktop Grids and Volunteer Computing Systems , 2012 .

[7]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[8]  Arnaud Legrand,et al.  Accuracy study and improvement of network simulation in the SimGrid framework , 2009, SimuTools.

[9]  Ewa Deelman,et al.  WorkflowSim: A toolkit for simulating scientific workflows in distributed environments , 2012, 2012 IEEE 8th International Conference on E-Science.

[10]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[11]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[12]  Subhash Saini,et al.  GridFlow: workflow management for grid computing , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[13]  Rolf Riesen,et al.  Instruction-level simulation of a cluster at scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[14]  Tristan Glatard,et al.  Self-healing of workflow activity incidents on distributed computing infrastructures , 2013, Future Gener. Comput. Syst..

[15]  Douglas Thain,et al.  Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids , 2012, SWEET '12.

[16]  Jinjun Chen,et al.  Temporal dependency-based checkpoint selection for dynamic verification of temporal constraints in scientific workflow systems , 2011, TSEM.

[17]  Chase Qishi Wu,et al.  Maximizing Workflow Throughput for Streaming Applications in Distributed Environments , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[18]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[19]  Daniel A. Reed,et al.  Fault Tolerance and Recovery of Scientific Workflows on Computational Grids , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[20]  David P. Anderson,et al.  EmBOINC: An emulator for performance analysis of BOINC projects , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[21]  Andrei Tchernykh,et al.  A Grid simulation framework to study advance scheduling strategies for complex workflow applications , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[22]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[23]  Thomas Phan,et al.  Parallel Simulation of Large-Scale Parallel Applications , 2001, Int. J. High Perform. Comput. Appl..

[24]  Radu Prodan,et al.  Dynamic Cloud provisioning for scientific Grid workflows , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[25]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[26]  Arnaud Legrand,et al.  Scalable Multi-purpose Network Representation for Large Scale Distributed System Simulation , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[27]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[28]  Jesús Carretero,et al.  Design of a New Cloud Computing Simulation Platform , 2011, ICCSA.

[29]  Radu Prodan,et al.  A Multi-objective Approach for Workflow Scheduling in Heterogeneous Environments , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[30]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[31]  Joel H. Saltz,et al.  Toward Optimizing Latency Under Throughput Constraints for Application Workflows on Clusters , 2007, Euro-Par.

[32]  Henri Casanova,et al.  Speed and accuracy of network simulation in the SimGrid framework , 2007, ValueTools '07.

[33]  Radu Prodan,et al.  Integration of an Event-Based Simulation Framework into a Scientific Workflow Execution Environment for Grids and Clouds , 2011, ServiceWave.

[34]  Kuo-Chan Huang,et al.  Pewss: A platform of extensible workflow simulation service for workflow scheduling research , 2018, Softw. Pract. Exp..

[35]  Patricia J. Teller,et al.  SimBA: A Discrete Event Simulator for Performance Prediction of Volunteer Computing Projects , 2007, 21st International Workshop on Principles of Advanced and Distributed Simulation (PADS'07).

[36]  Radu Prodan,et al.  Fostering Energy-Awareness in Simulations behind Scientific Workflow Management Systems , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[37]  Rizos Sakellariou,et al.  Energy-Constrained Provisioning for Scientific Workflow Ensembles , 2013, 2013 International Conference on Cloud and Green Computing.

[38]  DeelmanEwa,et al.  Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds , 2015 .

[39]  Michael Laurenzano,et al.  PSINS: An Open Source Event Tracer and Execution Simulator , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.

[40]  Yves Robert,et al.  Optimizing latency and reliability of pipeline workflow applications , 2007, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[41]  Jun Qin,et al.  ASKALON: A Development and Grid Computing Environment for Scientific Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[42]  George F. Riley,et al.  The ns-3 Network Simulator , 2010, Modeling and Tools for Network Simulation.

[43]  Gabor Kecskemeti,et al.  DISSECT-CF: A simulator to foster energy-aware scheduling in infrastructure clouds , 2015, Simul. Model. Pract. Theory.

[44]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[45]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[46]  Márk Jelasity,et al.  PeerSim: A scalable P2P simulator , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[47]  Henri Casanova,et al.  On the validity of flow-level tcp network models for grid and cloud simulations , 2013, TOMC.

[48]  A. Lumsdaine,et al.  LogGOPSim: simulating large-scale applications in the LogGOPS model , 2010, HPDC '10.

[49]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[50]  S. Krause,et al.  OverSim: A Flexible Overlay Network Simulation Framework , 2007, 2007 IEEE Global Internet Symposium.

[51]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[52]  Malcolm P. Atkinson,et al.  Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows , 2016, WORKS@SC.

[53]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[54]  Christopher D. Carothers,et al.  ROSS: a high-performance, low memory, modular time warp system , 2000, PADS '00.

[55]  Arnaud Legrand,et al.  Toward Better Simulation of MPI Applications on Ethernet/TCP Networks , 2013, PMBS@SC.