Measuring the Effectiveness of Throttled Data Transfers on Data-Intensive Workflows

In data intensive workflows, which often involve files, transfer between tasks is typically accomplished as fast as the network links allow, and once transferred, the files are buffered/stored at their destination. Where a task requires multiple files to execute (from different previous tasks), it must remain idle until all files are available. Hence, network bandwidth and buffer/storage within a workflow are often not used effectively. In this paper, we are quantitatively measuring the impact that applying an intelligent data movement policy can have on buffer/storage in comparison with existing approaches. Our main objective is to propose a metric that considers a workflow structure expressed as a Directed Acyclic Graph (DAG), and performance information collected from historical past executions of the considered workflow. This metric is intended for use at the design-stage, to compare various DAG structures and evaluate their potential for optimisation (of network bandwidth and buffer use).

[1]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2]  Henri Casanova,et al.  SimGrid: A Generic Framework for Large-Scale Distributed Experiments , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).

[3]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[4]  Wil vanderAalst,et al.  Workflow Management: Models, Methods, and Systems , 2004 .

[5]  Michael K. Molloy Performance Analysis Using Stochastic Petri Nets , 1982, IEEE Transactions on Computers.

[6]  Ewa Deelman,et al.  Pegasus: Mapping Large-Scale Workflows to Distributed Resources , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[7]  Wil M. P. van der Aalst,et al.  An Alternative Way to Analyze Workflow Graphs , 2002, CAiSE.

[8]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[9]  Mark Greenwood,et al.  Taverna: lessons in creating a workflow environment for the life sciences: Research Articles , 2006 .

[10]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[11]  Manuel Silva Suárez,et al.  Embedded Product-Form Queueing Networks and the Improvement of Performance Bounds for Petri Net Systems , 1993, Perform. Evaluation.

[12]  Omer F. Rana,et al.  Automating Data-Throttling Analysis for Data-Intensive Workflows , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[13]  Jesús Carretero,et al.  Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications , 2010, The Journal of Supercomputing.

[14]  Ricardo J. Rodríguez,et al.  Accurate Performance Estimation for Stochastic Marked Graphs by Bottleneck Regrowing , 2010, EPEW.

[15]  Sang-Min Park,et al.  Data throttling for data-intensive workflows , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[16]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[17]  Daniel S. Katz,et al.  Generating Complex Astronomy Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[18]  Allan Clark,et al.  State-Aware Performance Analysis with eXtended Stochastic Probes , 2008, EPEW.