A Data-Intensive Workflow Scheduling Algorithm for Grid Computing

The data-intensive workflow in scientific and enterprise grids has gained popularity in recent times. Data-intensive workflow needs to access, process and transfer large datasets that may each be replicated on different data hosts. Because of the large data sets, the execution time is bounded by the cost of data transfer. Minimizing the time of transferring these datasets to the computational resources where the tasks of workflow are executed requires that appropriate computational and data resources be selected. In this paper, we introduce an algorithm MDTT to select the resource set which the task should be mapped. Our experiments show that our algorithm is able to minimize the total makespan of data-intensive workflow and the time of data transferring.

[1]  Geoffrey Fox,et al.  Special Issue: Workflow in Grid Systems , 2006, Concurr. Comput. Pract. Exp..

[2]  Rajkumar Buyya,et al.  A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[3]  Reagan Moore,et al.  Data-intensive computing and digital libraries , 1998, CACM.

[4]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[5]  Rajkumar Buyya,et al.  Workflow scheduling algorithms for grid computing , 2008 .

[6]  Geoffrey C. Fox,et al.  Workflow in Grid Systems , 2004 .

[7]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[8]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[9]  Rizos Sakellariou,et al.  A hybrid heuristic for DAG scheduling on heterogeneous systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[10]  Srikumar Venugopal,et al.  A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.