A Two-Step Data Placement and Task Scheduling Strategy for Optimizing Scientific Workflow Performance on Cloud Computing Platform
暂无分享,去创建一个
Scientific workflows in collaborative cloud environments are becoming more and more popular.There is an urgent need to address the problem of large amount of data transfer across geo-distributed data centers during workflow execution.By utilizing data dependencies,we propose a two-stage data placement strategy and a task scheduling strategy for efficient workflow execution.With our strategy,the most related datasets can be placed into the same data center based on the data dependence between them at workflow build-time;then the tasks are scheduled to their most closely related data centers for execution and the newly-generated data sets are put into the data center that has the most dependency with them at workflow runtime.The experimental results show that the proposed strategy can significantly reduce the volume of data transfer among different data centers,and hence improve the performance of running scientific workflows and cut down the cost of doing science on the clouds as well.