论文信息 - Measuring the Effectiveness of Throttled Data Transfers on Data-Intensive Workflows

Measuring the Effectiveness of Throttled Data Transfers on Data-Intensive Workflows

In data intensive workflows, which often involve files, transfer between tasks is typically accomplished as fast as the network links allow, and once transferred, the files are buffered/stored at their destination. Where a task requires multiple files to execute (from different previous tasks), it must remain idle until all files are available. Hence, network bandwidth and buffer/storage within a workflow are often not used effectively. In this paper, we are quantitatively measuring the impact that applying an intelligent data movement policy can have on buffer/storage in comparison with existing approaches. Our main objective is to propose a metric that considers a workflow structure expressed as a Directed Acyclic Graph (DAG), and performance information collected from historical past executions of the considered workflow. This metric is intended for use at the design-stage, to compare various DAG structures and evaluate their potential for optimisation (of network bandwidth and buffer use).

Ricardo J. Rodríguez | Omer F. Rana | Rafael Tolosana-Calasanz

[1] Rajkumar Buyya,et al. A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2] Henri Casanova,et al. SimGrid: A Generic Framework for Large-Scale Distributed Experiments , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).

[3] Tadao Murata,et al. Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[4] Wil vanderAalst,et al. Workflow Management: Models, Methods, and Systems , 2004 .

[5] Michael K. Molloy. Performance Analysis Using Stochastic Petri Nets , 1982, IEEE Transactions on Computers.

[6] Ewa Deelman,et al. Pegasus: Mapping Large-Scale Workflows to Distributed Resources , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[7] Wil M. P. van der Aalst,et al. An Alternative Way to Analyze Workflow Graphs , 2002, CAiSE.

[8] Edward A. Lee,et al. CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[9] Mark Greenwood,et al. Taverna: lessons in creating a workflow environment for the life sciences: Research Articles , 2006 .

[10] Rajkumar Buyya,et al. A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[11] Manuel Silva Suárez,et al. Embedded Product-Form Queueing Networks and the Improvement of Performance Bounds for Petri Net Systems , 1993, Perform. Evaluation.

[12] Omer F. Rana,et al. Automating Data-Throttling Analysis for Data-Intensive Workflows , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[13] Jesús Carretero,et al. Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications , 2010, The Journal of Supercomputing.

[14] Ricardo J. Rodríguez,et al. Accurate Performance Estimation for Stochastic Marked Graphs by Bottleneck Regrowing , 2010, EPEW.

[15] Sang-Min Park,et al. Data throttling for data-intensive workflows , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[16] Dennis Gannon,et al. Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[17] Daniel S. Katz,et al. Generating Complex Astronomy Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[18] Allan Clark,et al. State-Aware Performance Analysis with eXtended Stochastic Probes , 2008, EPEW.