Multi-Objective Scientific-Workflow Scheduling With Data Movement Awareness in Cloud

Due to serving several purposes simultaneously, running scientific workflows on dynamic environments such as cloud computing, has become multi-objective scheduling. Among these purposes, Cost and Makespan are probably the most two primitive objectives. Another critical factor in a large-scale scientific workflow is tremendous amount of data during execution. Therefore, this work also includes Data Movement as an additional objective as it has a major impact on network utilization and energy consumption in network equipment in cloud data center. In considering these three objectives, this work proposes a framework for scheduling solutions which combines a new nodes clustering technique in Directed Acyclic Graph (DAG) model known as Multilevel Dependent Node Clustering (MDNC) and the multi-objective optimization, Extreme Nondominated Sorting Genetic Algorithm-III (E-NSGA-III). E-NSGA-III is the recent extension of Nondominated Sorting Genetic Algorithm (NSGA-III). Five well-known scientific workflows, CyberShake, Epigenomics, LIGO, Montage, and SIPHT are selected as testbeds, while the commonly known Hypervolume is chosen as the performance metric. In this work, MDNC is also experimented with both NSGA-III. Comparison among three approaches, E-NSGA-III alone, E-NSGA-III with Peer-to-Peer clustering and E-NSGA-III with MDNC are carried out. The superiority of the proposed framework among them and its limitation are discussed.

[1]  P. Laird Institutional Profile: The USC Epigenome Center , 2009 .

[2]  Albert Y. Zomaya,et al.  CA-DAG: Modeling Communication-Aware Applications for Scheduling in Cloud Computing , 2015, Journal of Grid Computing.

[3]  Xiaohui Liu,et al.  Evolutionary Multi-Objective Workflow Scheduling in Cloud , 2016, IEEE Transactions on Parallel and Distributed Systems.

[4]  Rajkumar Buyya,et al.  A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments , 2017, Concurr. Comput. Pract. Exp..

[5]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[6]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[7]  Khaled Ghédira,et al.  Elitist Ant System for the Distributed Job Shop Scheduling Problem , 2017, IEA/AIE.

[8]  Jacek Blazewicz,et al.  Handbook on Scheduling: From Theory to Applications , 2014 .

[9]  Claudia Szabo,et al.  Evolving multi-objective strategies for task allocation of scientific workflows on public clouds , 2012, 2012 IEEE Congress on Evolutionary Computation.

[10]  Huifang Deng,et al.  A Hybrid Metaheuristic for Multi-Objective Scientific Workflow Scheduling in a Cloud Environment , 2018 .

[11]  Pascal Bouvry,et al.  Multi-objective Cooperative Coevolutionary Evolutionary Algorithms for Continuous and Combinatorial Optimization , 2011, Intelligent Decision Systems in Large-Scale Distributed Environments.

[12]  Florin Pop A Fault Tolerant Decentralized Scheduling in Large Scale Distributed Systems , 2010 .

[13]  Antonio J. Nebro,et al.  Redesigning the jMetal Multi-Objective Optimization Framework , 2015, GECCO.

[14]  Martin Maier,et al.  Workflow Scheduling in Multi-Tenant Cloud Computing Environments , 2017, IEEE Transactions on Parallel and Distributed Systems.

[15]  Hisao Ishibuchi,et al.  Performance comparison of NSGA-II and NSGA-III on various many-objective test problems , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[16]  Hong Liu,et al.  Energy proportional datacenter networks , 2010, ISCA.

[17]  John E. Dennis,et al.  Normal-Boundary Intersection: A New Method for Generating the Pareto Surface in Nonlinear Multicriteria Optimization Problems , 1998, SIAM J. Optim..

[18]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[19]  Pascal Bouvry,et al.  A Survey of Evolutionary Computation for Resource Management of Processing in Cloud Computing [Review Article] , 2015, IEEE Computational Intelligence Magazine.

[20]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[21]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[22]  Naixue Xiong,et al.  A Pretreatment Workflow Scheduling Approach for Big Data Applications in Multicloud Environments , 2016, IEEE Transactions on Network and Service Management.

[23]  Qing Liao,et al.  Energy Consumption Optimization Scheme of Cloud Data Center Based on SDN , 2018 .

[24]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[25]  Ann L. Chervenak,et al.  Characterizing and profiling scientific workflows , 2013, Future Gener. Comput. Syst..

[26]  Pascal Bouvry,et al.  Multi-Objective Scheduling for Scientific Workflows on Cloud with Peer-to-Peer Clustering , 2019, 2019 11th International Conference on Knowledge and Smart Technology (KST).

[27]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[28]  Ewa Deelman,et al.  WorkflowSim: A toolkit for simulating scientific workflows in distributed environments , 2012, 2012 IEEE 8th International Conference on E-Science.

[29]  A. B. Kahn,et al.  Topological sorting of large networks , 1962, CACM.

[30]  Maxim Sviridenko,et al.  Tight Bounds for Permutation Flow Shop Scheduling , 2008, Math. Oper. Res..

[31]  El-Ghazali Talbi,et al.  Metaheuristics - From Design to Implementation , 2009 .

[32]  Pascal Bouvry,et al.  Extreme Solutions NSGA-III (E-NSGA-III) for Scientific Workflow Scheduling on Cloud , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[33]  M. Livny,et al.  High-Throughput, Kingdom-Wide Prediction and Annotation of Bacterial Non-Coding RNAs , 2008, PloS one.

[34]  Joshua R. Smith,et al.  LIGO: The laser interferometer gravitational-wave observatory , 2006, QELS 2006.

[35]  J. Christopher Beck,et al.  Logic-based Benders Decomposition for Alternative Resource Scheduling with Sequence Dependent Setups , 2012, ECAI.

[36]  Min-Yuan Cheng,et al.  Symbiotic Organisms Search: A new metaheuristic optimization algorithm , 2014 .

[37]  Rajkumar Buyya,et al.  Deadline Based Resource Provisioningand Scheduling Algorithm for Scientific Workflows on Clouds , 2014, IEEE Transactions on Cloud Computing.

[38]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[39]  Wenny H. M. Raaymakers,et al.  Makespan estimation in batch process industries: A comparison between regression analysis and neural networks , 2003, Eur. J. Oper. Res..

[40]  Johan Montagnat,et al.  Scientific workflows: Past, present and future , 2017, Future Gener. Comput. Syst..

[41]  F. Raab,et al.  Laser interferometer gravitational-wave observatory , 1993, Proceedings of LEOS '93.

[42]  Claudio Fabiano Motta Toledo,et al.  Genetic-based algorithms applied to a workflow scheduling algorithm with security and deadline constraints in clouds , 2017, Comput. Electr. Eng..

[43]  Huifang Deng,et al.  Elastic Scheduling of Scientific Workflows under Deadline Constraints in Cloud Computing Environments , 2018, Future Internet.

[44]  Qingsheng Zhu,et al.  Fluctuation-Aware and Predictive Workflow Scheduling in Cost-Effective Infrastructure-as-a-Service Clouds , 2018, IEEE Access.

[45]  Rajkumar Buyya,et al.  Budget-Driven Scheduling of Scientific Workflows in IaaS Clouds with Fine-Grained Billing Periods , 2017, ACM Trans. Auton. Adapt. Syst..

[46]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[47]  Bryan Ng,et al.  Budget and Deadline Aware e-Science Workflow Scheduling in Clouds , 2019, IEEE Transactions on Parallel and Distributed Systems.

[48]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[49]  Li Zhao,et al.  SCEC CyberShake Workflows - Automating Probabilistic Seismic Hazard Analysis Calculations , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[50]  Pascal Bouvry,et al.  Measuring data locality ratio in virtual MapReduce cluster using WorkflowSim , 2017, 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[51]  Kalyanmoy Deb,et al.  An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints , 2014, IEEE Transactions on Evolutionary Computation.

[52]  Dick H. J. Epema,et al.  Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds , 2013, Future Gener. Comput. Syst..

[53]  Benjamín Barán,et al.  Performance metrics in multi-objective optimization , 2015, 2015 Latin American Computing Conference (CLEI).

[54]  Kalyanmoy Deb,et al.  Simulated Binary Crossover for Continuous Search Space , 1995, Complex Syst..

[55]  Manu Vardhan,et al.  Cost Effective Genetic Algorithm for Workflow Scheduling in Cloud Under Deadline Constraint , 2016, IEEE Access.

[56]  Prasanta K. Jana,et al.  A novel cost-efficient approach for deadline-constrained workflow scheduling by dynamic provisioning of resources , 2018, Future Gener. Comput. Syst..

[57]  Daniel S. Katz,et al.  Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand , 2004, SPIE Astronomical Telescopes + Instrumentation.

[58]  Lin Zhang,et al.  Greedy-Ant: Ant Colony System-Inspired Workflow Scheduling for Heterogeneous Computing , 2017, IEEE Access.