On a multi-objective evolutionary algorithm for optimizing end-to-end performance of scientific workflows in distributed environments

Large-scale distributed scientific workflows demand various system resources that are geographically scattered and typically shared by many users through wide-area network connections. These domain-specific workflows have different end-to-end performance requirements, which necessitate optimizing multiple objectives when mapped to heterogeneous network environments. We construct mathematical models for workflow mapping and formulate it as a multi-objective optimization problem to minimize latency and maximize throughput. We propose a workflow mapping solution based on a multi-objective genetic algorithm that uses a chromosome scheme to represent a set of possible workflow mapping schemes and employs genetic operators including selection, mutation and crossover to steer the evolution process. The performance superiority of the proposed mapping solution is illustrated through extensive simulations in comparison with existing workflow mapping methods.

[1]  Jaroslaw Arabas,et al.  Applying an evolutionary algorithm to telecommunication network design , 2001, IEEE Trans. Evol. Comput..

[2]  Chase Qishi Wu,et al.  Analyzing Execution Dynamics of Scientific Workflows for Latency Minimization in Resource Sharing Environments , 2011, 2011 IEEE World Congress on Services.

[3]  Chase Qishi Wu,et al.  Supporting Distributed Application Workflows in Heterogeneous Computing Environments , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[4]  G. Anandalingam,et al.  An integrated system for designing minimum cost survivable telecommunications networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[5]  Rajeev Kumar,et al.  Multicriteria Network Design Using Evolutionary Algorithm , 2003, GECCO.

[6]  Carolyn McCreary,et al.  A comparison of heuristics for scheduling DAGs on multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[7]  Ümit V. Çatalyürek,et al.  A task duplication based bottom-up scheduling algorithm for heterogeneous environments , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Mario Vento,et al.  An Improved Algorithm for Matching Large Graphs , 2001 .

[10]  Rajeev Kumar,et al.  Improved Sampling of the Pareto-Front in Multiobjective Genetic Optimizations by Steady-State Evolution: A Pareto Converging Genetic Algorithm , 2002, Evolutionary Computation.

[11]  Rajkumar Buyya,et al.  A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[12]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[13]  Dharma P. Agrawal,et al.  A task duplication based scheduling algorithm for heterogeneous systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[14]  Dharma P. Agrawal,et al.  Improving scheduling of tasks in a heterogeneous environment , 2004, IEEE Transactions on Parallel and Distributed Systems.

[15]  C. Siva Ram Murthy,et al.  A State-Space Search Approach for Optimizing Reliability and Cost of Execution in Distributed Sensor Networks , 2005, IWDC.

[16]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[17]  Cristina Boeres,et al.  A cluster-based strategy for scheduling task on heterogeneous processors , 2004 .

[18]  Rajeev Kumar,et al.  Multiobjective network design for realistic traffic models , 2007, GECCO '07.

[19]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[20]  Anthony A. Maciejewski,et al.  Task Matching and Scheduling in Heterogenous Computing Environments Using a Genetic-Algorithm-Based Approach , 1997, J. Parallel Distributed Comput..

[21]  Annie S. Wu,et al.  Sensor Network Optimization Using a Genetic Algorithm , 2003 .

[22]  Kuo-Chi Lin,et al.  An incremental genetic algorithm approach to multiprocessor scheduling , 2004, IEEE Transactions on Parallel and Distributed Systems.

[23]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[24]  Behrooz Shirazi,et al.  Analysis and Evaluation of Heuristic Methods for Static Task Scheduling , 1990, J. Parallel Distributed Comput..

[25]  Qishi Wu,et al.  On optimization of scientific workflows to support streaming applications in distributed network environments , 2010, The 5th Workshop on Workflows in Support of Large-Scale Science.

[26]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..