Structure aware resource estimation for effective scheduling and execution of data intensive workflows in cloud

Abstract A set of interdependent tasks used to automate a business or scientific process can be modelled as a workflow and represented in the form of a Directed Acyclic Graph (DAG) or Directed Acyclic Graph in XML (DAX). Cloud computing is the current popular technology that provides hardware and software resources that are accessible from anywhere and at any time. As the cloud users are relieved of the difficulties of managing hardware and software resources, it is the most convenient and suitable environment to execute workflows. Workflows that accept and process a large amount of data are termed as data intensive workflows. The execution cost of such workflows in the cloud depends not only on the configuration of the Virtual Machines (VMs) but also the cost of data transfer between the tasks. Due to the highly dynamic arrangement of tasks in the workflow, deciding the optimum configuration and exact number of VMs is a big challenge for researchers today. Hence, in this paper, an effective resource provisioning and scheduling mechanism based on the structure of the workflow is proposed. The significance of this work is to identify the required number of VMs and their configuration, based on the structure of the workflow and optimizing data transfer between the tasks. Popular workflows like Montage, CyberShake, Epigenomics and Inspiral are used to analyse the quality of this work, and the obtained results confirm that the proposed workflow scheduler is able to provide a notable reduction in execution cost without compromising the execution time.

[1]  Reagan Moore,et al.  Data-intensive computing and digital libraries , 1998, CACM.

[2]  Subhashis Banerjee,et al.  Design and Implementation of the Workflow of an Academic Cloud , 2011, DNIS.

[3]  Sarbjeet Singh,et al.  Deadline and cost based workflow scheduling in hybrid cloud , 2013, 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[4]  Dick H. J. Epema,et al.  Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds , 2013, Future Gener. Comput. Syst..

[5]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[6]  Radu Prodan,et al.  A Truthful Dynamic Workflow Scheduling Mechanism for Commercial Multicloud Environments , 2013, IEEE Transactions on Parallel and Distributed Systems.

[7]  Albert Y. Zomaya,et al.  Resource-efficient workflow scheduling in clouds , 2015, Knowl. Based Syst..

[8]  Rizos Sakellariou,et al.  A Performance Model to Estimate Execution Time of Scientific Workflows on the Cloud , 2014, 2014 9th Workshop on Workflows in Support of Large-Scale Science.

[9]  Haitao Liu,et al.  Workflow scheduling algorithm based on control structure reduction in cloud environment , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[10]  Xiao Liu,et al.  A Revised Discrete Particle Swarm Optimization for Cloud Workflow Scheduling , 2010, 2010 International Conference on Computational Intelligence and Security.

[11]  Rajkumar Buyya,et al.  A Survey of Scheduling and Management Techniques for Data-Intensive Application Workflows , 2012 .

[12]  Rajkumar Buyya,et al.  Deadline Based Resource Provisioningand Scheduling Algorithm for Scientific Workflows on Clouds , 2014, IEEE Transactions on Cloud Computing.

[13]  Thar Baker,et al.  Towards Autonomic Cloud Services Engineering via Intention Workflow Model , 2013, GECON.

[14]  Muriati Mukhtar,et al.  A combinatorial double auction resource allocation model in cloud computing , 2016, Inf. Sci..

[15]  Thar Baker,et al.  Intention-oriented programming support for runtime adaptive autonomic cloud-based applications , 2013, Comput. Electr. Eng..

[16]  Dharmendra K. Yadav,et al.  Multi-Objective Tasks Scheduling Algorithm for Cloud Computing Throughput Optimization☆ , 2015 .

[17]  Luiz Fernando Bittencourt,et al.  HCOC: a cost optimization algorithm for workflow scheduling in hybrid clouds , 2011, Journal of Internet Services and Applications.

[18]  Nelson Luis Saldanha da Fonseca,et al.  Scheduler for data-intensive workflows in public clouds , 2013, 2nd IEEE Latin American Conference on Cloud Computing and Communications.

[19]  Rizos Sakellariou,et al.  Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[20]  Bahman Javadi,et al.  Cloud-aware data intensive workflow scheduling on volunteer computing systems , 2015, Future Gener. Comput. Syst..

[21]  Keqin Li,et al.  Adaptive Workflow Scheduling on Cloud Computing Platforms with IterativeOrdinal Optimization , 2015, IEEE Transactions on Cloud Computing.

[22]  Li Liu,et al.  A Survey on Workflow Management and Scheduling in Cloud Computing , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[23]  Li-zhen Cui,et al.  A Data-Intensive Workflow Scheduling Algorithm for Grid Computing , 2009, 2009 Fourth ChinaGrid Annual Conference.

[24]  Jin-Soo Kim,et al.  Cost optimized provisioning of elastic resources for application workflows , 2011, Future Gener. Comput. Syst..